A Toolkit for Development of the Domain-Oriented Dictionaries for Structuring Document Flows

Makagonov, Pavel P.; Alexandrov, Mikhail A.; Sboychakov, Konstantin

doi:10.1007/978-3-642-59789-3_13

Pavel P. Makagonov⁸,
Mikhail A. Alexandrov⁹ &
Konstantin Sboychakov⁸

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

1834 Accesses
9 Citations

Abstract

An approach to thematic document classification, clusterization and investigation of document flows and collections based on domain-oriented dictionaries (DODs) is considered. It is simple enough to be used by, say, a secretary that frequently needs to classify and search large amounts of documents. However, for good results, such an approach requires a solid technology for construction and maintenance of the DODs; this task is to be performed by experts or advanced users. A DOD represents a specific subject topic and is constructed on the basis of the analysis of a collection of documents representing this topic, selected by a group of experts. The toolkit facilitates the development of a hierarchical system of DODs by the application of a set of heuristic criteria for the selection of the keywords from such a document collection representing one subject domain. In the paper, the application of the DODs developed with the toolkit for information retrieval is illustrated with examples.

Work done under partial support of CONACyT, Mexico

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

BOLSO, S. and A. MORRONE. (1998): A frequency dictionary of polyforms as a linguistic data base for text disambiguation in TALTAC, In: Data Science, Classification and Related Methods (Proc. of 6-th Intern. Conf. IFCS, Rome, Italy, 1998). Rome, 32–35
Google Scholar
LELU, A., and S. FERHAN. (1998): Clustering a textual data-flow by incremental density-modes seeking. In: Data Science, Classification and Related Methods (Proceedings of 6-th Intern. Conf. IFCS, Rome, Italy, 1998). Rome, 206–209
Google Scholar
MAKAGONOV, R, and K. SBOYCHAKOV. (1998): Man-machine methods for solution of weakly-formalized problems in humanitarian and natural fields of knowledge (visual heuristic cluster analysis). In: Pedro Galicia (Ed): Proceedings of International Computer Symposium CIC’98 (Mexico, 1998). National Polytechnic Institute, Mexico, 346–358
Google Scholar
TAKAKURA, S. (1998): Study of same methods of analysis of textual data in Japanese.In: Data Science, Classification and Related Methods (Proceedings of 6-th Intern. Conf. IFCS, Rome, Italy, 1998). Rome, 297–298. RENV
Google Scholar

Download references

Author information

Authors and Affiliations

Moscow Mayor’s Directorate, Moscow City Government, Novi Arbat 36, Moscow, 121205, Russia
Pavel P. Makagonov & Konstantin Sboychakov
Center for Computing Research (CIC), National Polytechnic Institute (IPN), Av. Juan de Dios Batiz, C.P. 07738, Mexico DF, Mexico
Mikhail A. Alexandrov

Authors

Pavel P. Makagonov
View author publications
You can also search for this author in PubMed Google Scholar
Mikhail A. Alexandrov
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Sboychakov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Groningen Heymans Institute (PA), Grote Kruisstraat 2/1, NL-9712 TS, Groningen, The Netherlands
Henk A. L. Kiers
Facultés Universitaires Notre-Dame de la Paix, University of Namur, Rempart de la Vierge, 8, B-5000, Namur, Belgium
Jean-Paul Rasson (Directeur du Department de Mathématique) (Directeur du Department de Mathématique)
Data Theory Group Department of Education, Leiden University, P.O. Box 9555, NL-2300 RB, Leiden, The Netherlands
Patrick J. F. Groenen
Lehrstuhl für Wirtschaftsinformatik III Schloß, University of Mannheim, D-68131, Mannheim, Germany
Martin Schader

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Makagonov, P.P., Alexandrov, M.A., Sboychakov, K. (2000). A Toolkit for Development of the Domain-Oriented Dictionaries for Structuring Document Flows . In: Kiers, H.A.L., Rasson, JP., Groenen, P.J.F., Schader, M. (eds) Data Analysis, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-59789-3_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-59789-3_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67521-1
Online ISBN: 978-3-642-59789-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics