Abstract
Transferring information from one part to another of the world is the main aim of communication. Now a day, the information is available in forms of documents or files created on requirements basis. The more the requirements the large the documents are. That is why; the way of creation which is random in nature as well as storage bends the documents unstructured in nature. The result is that, dealing with these documents becomes a headache. For the ease of process, the frequently required data should maintain certain pattern. But being unfortunate enough, most of the time we have to face problems like erroneous data retrieving or modification anomalies or even a large amount of time may be given for retrieving a single document. To overcome the situation, a solution has raised named unstructured document categorization. This field is a vast one containing all kind of solutions for various type of document categorization. Basically, the documents which are unstructured in nature will be categorized based on some given constraints. And through this paper we would like to highlight the most as well as popular techniques like text and data mining, genetic algorithm, lexical chaining, binarization methods in the field of unstructured document categorization so that we can reach the fulfillment of desired unstructured document categorization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bond, C.E., Shipton, Z.K., Jones, R.R., Butler, R.W.H., Gibbs, A.D.: Knowledge transfer in a digital world: Field data acquisition, uncertainty, visualization, and data management. Geosphere 3(6), 568–576 (2007)
Müller, A., Dörre, J., Gerstl, P., Seiffert, R.: The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection. In: 32nd Annual Hawaii International Conference on System Sciences, Maui, HI, USA, January 5-8, vol. Track2, p. 9 (1999)
Gonzaga, L., Grivet, M., TerezaVasconcelos, A.: A Simple and Fast Term Selection Procedure for Text Clustering. In: International Conference on Intelligent Systems Design and Applications, October 20-24, pp. 777–781. Rio de Janeiro (2007)
Alam, H., Kumar, A., Nakamura, M., Rahman, F., Tarnikova, Y., Wilcox, C.: Structured and Unstructured Document Summarization: Design of a Commercial Summarizer using Lexical Chains. In: Seventh International Conference on Document Analysis and Recognition, August 3-6, pp. 1147–1152. Edinburgh, Scotland (2003)
Pathak, P., Gordon, M., Fan, W.: Effective Information Retrieval using Genetic Algorithms based Matching Functions Adaptation. In: 33rd Annual Hawaii International Conference on System Sciences, Hawaii, January 4-7, vol. 1, p. 8 (2000)
http://www.gc.ssr.upm.es/inves/neural/ann1/concepts/Suunsupm.htm
Goren-Bar, D., Kuflik, T., Lev, D.: Supervised Learning for Automatic Classification of Documents using Self-Organizing Maps. In: First DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland, December 11-12, pp. 1–4 (2000)
Yosef, I.B., Kedem, K., Dinstein, I., Beit-Arie, M., Engel, E.: Classification of Hebrew Calligraphic Handwriting Styles: Preliminary Results. In: First International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, January 23-24, pp. 299–305 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhattacharyya, D. et al. (2009). In Search of Unstructured Documents Categorization. In: Kim, Th., Yang, L.T., Park, J.H., Chang, A.CC., Vasilakos, T., Yeo, SS. (eds) Advances in Communication and Networking. FGCN 2008. Communications in Computer and Information Science, vol 27. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10236-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-10236-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10235-6
Online ISBN: 978-3-642-10236-3
eBook Packages: Computer ScienceComputer Science (R0)