Abstract
An approach to thematic document classification, clusterization and investigation of document flows and collections based on domain-oriented dictionaries (DODs) is considered. It is simple enough to be used by, say, a secretary that frequently needs to classify and search large amounts of documents. However, for good results, such an approach requires a solid technology for construction and maintenance of the DODs; this task is to be performed by experts or advanced users. A DOD represents a specific subject topic and is constructed on the basis of the analysis of a collection of documents representing this topic, selected by a group of experts. The toolkit facilitates the development of a hierarchical system of DODs by the application of a set of heuristic criteria for the selection of the keywords from such a document collection representing one subject domain. In the paper, the application of the DODs developed with the toolkit for information retrieval is illustrated with examples.
Work done under partial support of CONACyT, Mexico
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BOLSO, S. and A. MORRONE. (1998): A frequency dictionary of polyforms as a linguistic data base for text disambiguation in TALTAC, In: Data Science, Classification and Related Methods (Proc. of 6-th Intern. Conf. IFCS, Rome, Italy, 1998). Rome, 32–35
LELU, A., and S. FERHAN. (1998): Clustering a textual data-flow by incremental density-modes seeking. In: Data Science, Classification and Related Methods (Proceedings of 6-th Intern. Conf. IFCS, Rome, Italy, 1998). Rome, 206–209
MAKAGONOV, R, and K. SBOYCHAKOV. (1998): Man-machine methods for solution of weakly-formalized problems in humanitarian and natural fields of knowledge (visual heuristic cluster analysis). In: Pedro Galicia (Ed): Proceedings of International Computer Symposium CIC’98 (Mexico, 1998). National Polytechnic Institute, Mexico, 346–358
TAKAKURA, S. (1998): Study of same methods of analysis of textual data in Japanese.In: Data Science, Classification and Related Methods (Proceedings of 6-th Intern. Conf. IFCS, Rome, Italy, 1998). Rome, 297–298. RENV
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin · Heidelberg
About this paper
Cite this paper
Makagonov, P.P., Alexandrov, M.A., Sboychakov, K. (2000). A Toolkit for Development of the Domain-Oriented Dictionaries for Structuring Document Flows . In: Kiers, H.A.L., Rasson, JP., Groenen, P.J.F., Schader, M. (eds) Data Analysis, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-59789-3_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-59789-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67521-1
Online ISBN: 978-3-642-59789-3
eBook Packages: Springer Book Archive