Skip to main content

Document Management and the Development of Information Spaces

  • Conference paper
Classification — the Ubiquitous Challenge
  • 2268 Accesses

Abstract

Through the use of formal document structures, for example paragraphs and tables, steps are shown on how to use these to extract information in the course of the automatic recognition of the contents of OpenOffice text documents and HTML documents as part of a document management project. It is possible to create formal graphs that structure the document-related information space based on a given information model by using a natural language processing chain and a wrapping procedure. A combined text and layout analysis is carried out with open source components that aims at representing information as a semantic network in a formal and visualizable manner. Scalable ways of retrieving information and processing knowledge are produced by uniting document-related information spaces to form thematic domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • DAY, D. et al. (1997): Mixed-Initiative Development of Language Processing Systems. In: Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics, Washington D.C. URL: http://www.mitre.org/tech/alembic-workbench/ANLP97-bigger.html.

    Google Scholar 

  • FREY, M. (2002): The Role of Data Representation in Sentence Boundary Disambiguation with Neural Networks. FKIE-Bericht Nr. 46, Forschungsgesellschaft für Angewandte Naturwissenschaften e. V. (FGAN), Wachtberg.

    Google Scholar 

  • KISS, T. and STRUNK, J. (2003): Viewing sentence boundary detection as collocation identification. In: S. Busemann, S. (Ed.): Konvens 2002 Tagungsband. DFKI, Saarbrücken, 75–82. URL: http://www.linguistics.ruhr-unibochum.de/~kiss/publications/07v-kiss.pdf.

    Google Scholar 

  • LOPER, E. and BIRD, S. (2002): NLTK: The Natural Language Toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia. URL: http://arxiv.org/PS_cache/cs/pdf/0205/0205028.pdf.

    Google Scholar 

  • MANNING, C. D. and SCHÜTZE, H. (2000): Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts and London, England.

    Google Scholar 

  • MÜLLER, F. H. and ULE, T. (2001): Satzklammer annotieren und Tags korrigieren — Ein mehrstufiges “Top-Down-Bottom-Up”-System zur flachen, robusten Annotierung von Sätzen im Deutschen. In: H. Lobin (Ed.): Proceedings der GLDV-Frühjahrstagung 2001. Universität Gießen, 235–244. URL: http://www.uni-giessen.de/germanistik/asd/gldv2001/proceedings/ pdf/GLDV2001-mueller.pdf.

    Google Scholar 

  • MILLER, R. C. (2002): Lightweight Structure in Text. PhD thesis, Carnegie Mellon University, Pittsburgh, PA. URL: http://www-2.cs.cmu.edu/~rcm/papers/thesis/thesis.pdf.

    Google Scholar 

  • SCHILLER, A. et al. (1995): Guidelines für das Tagging deutscher Textcorpora mit STTS. Universität Stuttgart and Universität Tübingen. URL: http://www.sfs.nphil.uni-tuebingen.de/Elwis/stts/stts-guide.ps.gz.

    Google Scholar 

  • ZERNIK, U. (Ed.) (1991): Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Erlbaum, Hillsdale, New Jersey.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin · Heidelberg

About this paper

Cite this paper

Rist, U. (2005). Document Management and the Development of Information Spaces. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_62

Download citation

Publish with us

Policies and ethics