Document Management and the Development of Information Spaces

Rist, Ulfert

doi:10.1007/3-540-28084-7_62

Ulfert Rist²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2268 Accesses

Abstract

Through the use of formal document structures, for example paragraphs and tables, steps are shown on how to use these to extract information in the course of the automatic recognition of the contents of OpenOffice text documents and HTML documents as part of a document management project. It is possible to create formal graphs that structure the document-related information space based on a given information model by using a natural language processing chain and a wrapping procedure. A combined text and layout analysis is carried out with open source components that aims at representing information as a semantic network in a formal and visualizable manner. Scalable ways of retrieving information and processing knowledge are produced by uniting document-related information spaces to form thematic domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Document Layout Analysis for Semantic Information Extraction

Information Mapping

Modeling information systems from the viewpoint of active documents

Article Open access 29 July 2015

References

DAY, D. et al. (1997): Mixed-Initiative Development of Language Processing Systems. In: Fifth Conference on Applied Natural Language Processing. Association for Computational Linguistics, Washington D.C. URL: http://www.mitre.org/tech/alembic-workbench/ANLP97-bigger.html.
Google Scholar
FREY, M. (2002): The Role of Data Representation in Sentence Boundary Disambiguation with Neural Networks. FKIE-Bericht Nr. 46, Forschungsgesellschaft für Angewandte Naturwissenschaften e. V. (FGAN), Wachtberg.
Google Scholar
KISS, T. and STRUNK, J. (2003): Viewing sentence boundary detection as collocation identification. In: S. Busemann, S. (Ed.): Konvens 2002 Tagungsband. DFKI, Saarbrücken, 75–82. URL: http://www.linguistics.ruhr-unibochum.de/~kiss/publications/07v-kiss.pdf.
Google Scholar
LOPER, E. and BIRD, S. (2002): NLTK: The Natural Language Toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics. Association for Computational Linguistics, Philadelphia. URL: http://arxiv.org/PS_cache/cs/pdf/0205/0205028.pdf.
Google Scholar
MANNING, C. D. and SCHÜTZE, H. (2000): Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachusetts and London, England.
Google Scholar
MÜLLER, F. H. and ULE, T. (2001): Satzklammer annotieren und Tags korrigieren — Ein mehrstufiges “Top-Down-Bottom-Up”-System zur flachen, robusten Annotierung von Sätzen im Deutschen. In: H. Lobin (Ed.): Proceedings der GLDV-Frühjahrstagung 2001. Universität Gießen, 235–244. URL: http://www.uni-giessen.de/germanistik/asd/gldv2001/proceedings/ pdf/GLDV2001-mueller.pdf.
Google Scholar
MILLER, R. C. (2002): Lightweight Structure in Text. PhD thesis, Carnegie Mellon University, Pittsburgh, PA. URL: http://www-2.cs.cmu.edu/~rcm/papers/thesis/thesis.pdf.
Google Scholar
SCHILLER, A. et al. (1995): Guidelines für das Tagging deutscher Textcorpora mit STTS. Universität Stuttgart and Universität Tübingen. URL: http://www.sfs.nphil.uni-tuebingen.de/Elwis/stts/stts-guide.ps.gz.
Google Scholar
ZERNIK, U. (Ed.) (1991): Lexical Acquisition: Exploiting On-Line Resources to Build a Lexicon. Erlbaum, Hillsdale, New Jersey.
Google Scholar

Download references

Author information

Authors and Affiliations

IABG mbH, VG15, 85521, Ottobrunn, Germany
Ulfert Rist

Authors

Ulfert Rist
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich Statistik, Universität Dortmund, 44221, Dortmund
Claus Weihs
Institut für Entscheidungstheorie und Unternehmensforschung, Universität Karlsruhe (TH), 76128, Karlsruhe
Wolfgang Gaul

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rist, U. (2005). Document Management and the Development of Information Spaces. In: Weihs, C., Gaul, W. (eds) Classification — the Ubiquitous Challenge. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-28084-7_62

Download citation

DOI: https://doi.org/10.1007/3-540-28084-7_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25677-9
Online ISBN: 978-3-540-28084-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Document Management and the Development of Information Spaces

Abstract

Access this chapter

Preview

Similar content being viewed by others

Document Layout Analysis for Semantic Information Extraction

Information Mapping

Modeling information systems from the viewpoint of active documents

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Document Management and the Development of Information Spaces

Abstract

Access this chapter

Preview

Similar content being viewed by others

Document Layout Analysis for Semantic Information Extraction

Information Mapping

Modeling information systems from the viewpoint of active documents

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation