Skip to main content

Extracting Information from XML Documents by Reverse Generating a DTD

  • Conference paper
  • First Online:
EurAsia-ICT 2002: Information and Communication Technology (EurAsia-ICT 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2510))

Included in the following conference series:

Abstract

Information contained in XML documents cannot properly be interpreted without an appropriate DTD. However, XML documents collected from the web may not always be accompanied by the corresponding DTD, so that extracting information from such sources may not be easy. In this study, we reverse construct a DTD from DTD-unknown XML sources, and use it to extract information from XML inputs. The DTD construction module developed is designed to scan input XML files in 1-path, where most other implementations use 2-path approach. Developed modules provide clean Java programming interfaces as well, so that it can be integrated with other web applications seamlessly.

This works is supported in part by the Ministry of Information & Communication of Korea

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Oh, D., Jung, J.: Effective Web-Based Information Gathering Services of IHWA. Proceedings of ICEIC’2000 International Conference, Shenyang, China (2000) 202–205

    Google Scholar 

  2. Garofalakis, M., Gionis, A., Rastogi, R., Seshadri, S., Shim, K.: XTRACT-A System for Extracting Document Type Descriptors from XML Documents. Bell Labs Tech. Memorandum (1999)

    Google Scholar 

  3. Moh, C.-H., Lim, E.-P., Ng, W.-K.: Re-engineering Structures from Web Documents. Proceedings of the 5th ACM International Conference on Digital Libraries (DL2000), San Antonio, Texas, USA (2000)

    Google Scholar 

  4. Ha, S.: The Effective Exploitation of Heterogeneous Product Information for E-Commerce. Submitted for Publication (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jung, JS., Oh, DI., Kong, YH., Ahn, JK. (2002). Extracting Information from XML Documents by Reverse Generating a DTD. In: Shafazand, H., Tjoa, A.M. (eds) EurAsia-ICT 2002: Information and Communication Technology. EurAsia-ICT 2002. Lecture Notes in Computer Science, vol 2510. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36087-5_37

Download citation

  • DOI: https://doi.org/10.1007/3-540-36087-5_37

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00028-0

  • Online ISBN: 978-3-540-36087-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics