Skip to main content

In Search of Unstructured Documents Categorization

  • Conference paper
Advances in Communication and Networking (FGCN 2008)

Abstract

Transferring information from one part to another of the world is the main aim of communication. Now a day, the information is available in forms of documents or files created on requirements basis. The more the requirements the large the documents are. That is why; the way of creation which is random in nature as well as storage bends the documents unstructured in nature. The result is that, dealing with these documents becomes a headache. For the ease of process, the frequently required data should maintain certain pattern. But being unfortunate enough, most of the time we have to face problems like erroneous data retrieving or modification anomalies or even a large amount of time may be given for retrieving a single document. To overcome the situation, a solution has raised named unstructured document categorization. This field is a vast one containing all kind of solutions for various type of document categorization. Basically, the documents which are unstructured in nature will be categorized based on some given constraints. And through this paper we would like to highlight the most as well as popular techniques like text and data mining, genetic algorithm, lexical chaining, binarization methods in the field of unstructured document categorization so that we can reach the fulfillment of desired unstructured document categorization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bond, C.E., Shipton, Z.K., Jones, R.R., Butler, R.W.H., Gibbs, A.D.: Knowledge transfer in a digital world: Field data acquisition, uncertainty, visualization, and data management. Geosphere 3(6), 568–576 (2007)

    Article  Google Scholar 

  2. Müller, A., Dörre, J., Gerstl, P., Seiffert, R.: The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection. In: 32nd Annual Hawaii International Conference on System Sciences, Maui, HI, USA, January 5-8, vol. Track2, p. 9 (1999)

    Google Scholar 

  3. Gonzaga, L., Grivet, M., TerezaVasconcelos, A.: A Simple and Fast Term Selection Procedure for Text Clustering. In: International Conference on Intelligent Systems Design and Applications, October 20-24, pp. 777–781. Rio de Janeiro (2007)

    Google Scholar 

  4. Alam, H., Kumar, A., Nakamura, M., Rahman, F., Tarnikova, Y., Wilcox, C.: Structured and Unstructured Document Summarization: Design of a Commercial Summarizer using Lexical Chains. In: Seventh International Conference on Document Analysis and Recognition, August 3-6, pp. 1147–1152. Edinburgh, Scotland (2003)

    Chapter  Google Scholar 

  5. Pathak, P., Gordon, M., Fan, W.: Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation. In: 33rd Annual Hawaii International Conference on System Sciences, Hawaii, January 4-7, vol. 1, p. 8 (2000)

    Google Scholar 

  6. http://www.gc.ssr.upm.es/inves/neural/ann1/concepts/Suunsupm.htm

  7. Goren-Bar, D., Kuflik, T., Lev, D.: Supervised Learning for Automatic Classification of Documents using Self-Organizing Maps. In: First DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland, December 11-12, pp. 1–4 (2000)

    Google Scholar 

  8. Yosef, I.B., Kedem, K., Dinstein, I., Beit-Arie, M., Engel, E.: Classification of Hebrew Calligraphic Handwriting Styles: Preliminary Results. In: First International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, January 23-24, pp. 299–305 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bhattacharyya, D. et al. (2009). In Search of Unstructured Documents Categorization. In: Kim, Th., Yang, L.T., Park, J.H., Chang, A.CC., Vasilakos, T., Yeo, SS. (eds) Advances in Communication and Networking. FGCN 2008. Communications in Computer and Information Science, vol 27. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10236-3_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10236-3_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10235-6

  • Online ISBN: 978-3-642-10236-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics