Skip to main content

Structural Queries in Electronic Corpora

  • Chapter
Electronic Multimedia Publishing

Abstract

We present a methodology for automatically constructing structural hyperlinks in electronic technical corpora. A structural hyperlink connects components of a document that have specified structural properties with word-based content similarity. Our approach enables queries that may be posed in terms of keywords, as well as structural segments such as definitions, figures, etc.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Allan, “Automatic hypertext construction,” Ph.D. thesis, Department of Computer Science, Cornell University, 1995.

    Google Scholar 

  2. J. Allan, “Automatic hypertext link typing,” in Proc. of Hypertext 1996, Bethesda, Maryland, 1996.

    Google Scholar 

  3. B. Donald, J. Jennings, and D. Rus, “Analyzing teams of cooperating mobile robots,” in Proc. of the Int. Conf. on Robotics and Automation, San Diego, 1994.

    Google Scholar 

  4. H. Fujisawa, Y. Nakano, and K. Kurino, “Segmentation methods for character recognition: From segmentation to document structure analysis,” Proc. of the IEEE, Vol. 80, No. 7, 1992.

    Google Scholar 

  5. M. Fuller, E. Mackie, R. Sacks-Davis, and R. Wilkinson, “Structured answers for large structured document collections,” in Proc. of the Sixteenth Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 205–213.

    Google Scholar 

  6. L. Gravano, H. Garcia-Molina, and A. Tomasic, “The efficacy of G1OSS for the text database discovery problem,” Technical Report no. STAN-CS-TN-93–01, Computer Science Department, Stanford University, 1993.

    Google Scholar 

  7. M. Hearst and C. Plaunt, “Subtopic structuring for full-length document access,” in Proc. of the Sixteenth Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 59–68.

    Google Scholar 

  8. D. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing images using the Hausdorff distance,” IEEE Transactions on Pattern Matching and Machine Intelligence, 1993.

    Google Scholar 

  9. A. Jain and S. Bhattacharjee, “Address block location on envelopes using Gabor filters,” Pattern Recognition, Vol. 25, No. 12, 1992.

    Google Scholar 

  10. P. Kilpelainen and H. Mannila, “Retrieval from hierarchical texts by partial patterns,” in Proc. of the Sixteenth Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 1993, pp. 214–222.

    Google Scholar 

  11. M. Mizuno, Y. Tsuji, T. Tanaka, H. Tanaka, M. Iwashita, and T. Temma, “Document recognition system with layout structure generator,” NEC Research and Development, Vol. 32, No. 3, 1991.

    Google Scholar 

  12. G. Nagy, S. Seth, and M. Vishwanathan, “A prototype document image analysis system for technical journals,” Computer, Vol. 25, No. 7, 1992.

    Google Scholar 

  13. D. Rus and D. Subramanian, “Customizing information access,” ACM Computing Surveys, Vol. 27, No. 4, pp. 627–662, 1996.

    Article  Google Scholar 

  14. D. Rus and D. Subramanian, “Information retrieval, information structure, and information agents,” ACM Transactions on Information Systems, Vol. 15, No. 1, pp. 67–101, 1997.

    Article  Google Scholar 

  15. D. Rus and K. Summers, “Using whitespace for automated document structuring,” to appear in Advances in Digital Libraries, N. Adam, B. Bhargava, and Y. Yesha (Eds.), Springer-Verlag, Lecture Notes in Computer Science, 1995.

    Google Scholar 

  16. D. Rus and K. Summers, “Geometric algorithms and experiments for automated document structuring,” Journal of Mathematical and Computer Modelling, Vol. 6, No. 1, pp. 55–83, 1997.

    Article  Google Scholar 

  17. G. Salton, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Addison-Wesley, 1989.

    Google Scholar 

  18. G. Salton, “The smart document retrieval project,” in Proc. of the Fourteenth Annual Int. ACM/SIGIR Conf. on Research and Development in Information Retrieval, 1991, pp. 356–358.

    Google Scholar 

  19. G. Salton and M. McGill, Introduction to Modem Information Retrieval, McGraw-Hill: New York, 1983.

    Google Scholar 

  20. G. Salton and J. Allan, “Selective text utilization and text traversal,” in Hypertext ‘83 Proc., Seattle, Washington, 1993, pp. 131–144.

    Google Scholar 

  21. G. Salton, J. Allan, C. Buckley, and A. Singhal, “Automatic analysis, theme generation, and summarization of machine-readable texts,” Science, Vol. 264, pp. 1421–1426, 1994.

    Article  Google Scholar 

  22. G. Salton and A. Singhal, “Automatic text theme generation and the analysis of text structure,” Technical Report TR94–1438, Cornell University, Department of Computer Science, 1994.

    Google Scholar 

  23. Y. Tanosaki, K. Suzuki, K. Kikuchi, and M. Kurihara, “A logical structure analysis system for documents,” Proc. of the Second Int. Symposium on Interoperable Information Systems, 1988.

    Google Scholar 

  24. S. Tsujimoto and H. Asada, “Major components of a complete text reading system,” in Proc. of the IEEE, 1992, Vol. 80, No. 7.

    Google Scholar 

  25. H. Turtle, “Inference networks for document retrieval,” Ph.D. thesis, University of Massachusetts, Amherst, 1990.

    Google Scholar 

  26. D. Wang and S. Srihari, “Classification of newspaper image blocks using texture analysis,” Computer Vision, Graphics, and Image Processing, Vol. 47, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Rus, D., Allan, J. (1998). Structural Queries in Electronic Corpora. In: Makedon, F., Rebelsky, S.A. (eds) Electronic Multimedia Publishing. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-34906-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-0-585-34906-0_4

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4757-8271-4

  • Online ISBN: 978-0-585-34906-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics