On Text Mining Algorithms for Automated Maintenance of Hierarchical Knowledge Directory

  • Han-joon Kim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4092)


This paper presents a series of text-mining algorithms for managing knowledge directory, which is one of the most crucial problems in constructing knowledge management systems today. In future systems, the constructed directory, in which knowledge objects are automatically classified, should evolve so as to provide a good indexing service, as the knowledge collection grows or its usage changes. One challenging issue is how to combine manual and automatic organization facilities that enable a user to flexibly organize obtained knowledge by the hierarchical structure over time. To this end, I propose three algorithms that utilize text mining technologies: semi-supervised classification, semi-supervised clustering, and automatic directory building. Through experiments using controlled document collections, the proposed approach is shown to significantly support hierarchical organization of large electronic knowledge base with minimal human effort.


Unlabeled Data Knowledge Object Concept Drift Label Training Data Directory Building 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggrawal, R., Bayardo, R.J., Srikant, R.: Athena: Mining-based Interactive Management of Text Databases. In: Zaniolo, C., Grust, T., Scholl, M.H., Lockemann, P.C. (eds.) EDBT 2000. LNCS, vol. 1777, pp. 365–379. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  2. 2.
    Bonifacio, M., Bouquet, P., Traverso, P.: Enabling distributed knowledge management managerial and technological impliations. Informatik/Informatique 3(1) (2002)Google Scholar
  3. 3.
    Dempster, A.P., Laird, N., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society B39, 1–38 (1977)MathSciNetGoogle Scholar
  4. 4.
    Demiriz, A., Bennett, K.: Optimization Approaches to Semi-Supervised Learning. In: Ferris, M., Mangasarian, O., Pang, J. (eds.) Applications and Algorithms of Complementarity. Kluwer Academic Publishers, Dordrecht (2000)Google Scholar
  5. 5.
    Han, E., Karypis, G., Kumar, V.: Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification. In: Proc. of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 53–65 (1991)Google Scholar
  6. 6.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features, Technical Report LS8-Report, Univ. of Dortmund (1997)Google Scholar
  7. 7.
    Kim, H.J., Lee, S.G.: A Semi-Supervised Document Clustering Technique for Information Organization. In: Proc. of the 9th Int’l Conf. on Information and Knowledge Management, pp. 30–37 (2000)Google Scholar
  8. 8.
    Labzour, T., Bensaid, A., Bezdek, J.: Improved Semi-Supervised Point-Prototype Clustering Algorithms. In: Proc. of the 7th International Conference on Fuzzy Systems, pp. 1383–1387 (1998)Google Scholar
  9. 9.
    Mitchell, T.M.: Bayesian Learning. In: Machine Learning, pp. 154–200. McGraw-Hill, New York (1997)Google Scholar
  10. 10.
    Mitchell, T.M.: Artificial Neural Networks. In: Machine Learning, pp. 81–126. McGraw-Hill, New York (1997)Google Scholar
  11. 11.
    Muslea, I., Minton, S., Knoblock, C.: Active + semi-supervised learning = robust multi-view learning. In: Proc. of the 19th International Conference on Machine Learning, pp. 435–442 (2002)Google Scholar
  12. 12.
    Nigam, K.: Using Unlabeled Data to Improve Text Classification, Ph.D. thesis, Carnegie Mellon University (2001)Google Scholar
  13. 13.
    Ogawa, Y., Moria, T., Kobayashi, K.: A Fuzzy Document Retrieval System Using the Key Word Connection Matrix and a Learning Method. Fuzzy Sets and Systems 39, 163–179 (1991)CrossRefMathSciNetGoogle Scholar
  14. 14.
    Sahami, M., Yusufali, S., Baldonado, M.Q.: SONIA: A Service for Organizing Networked Information Autonomously. In: Proc. of the 3rd ACM International Conference on Digital Libraries, pp. 200–209 (1998)Google Scholar
  15. 15.
    Schneider, K.-M.: Techniques for Improving the Performance of Naive Bayes for Text Classification. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 682–693. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  16. 16.
    Talavera, L., Béjar, J.: Integrating declarative knowledge in hierarchical clustering tasks. In: Hand, D.J., Kok, J.N., Berthold, M.R. (eds.) IDA 1999. LNCS, vol. 1642, pp. 211–222. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  17. 17.
    Content Management, Metadata & Semantic Web: Keynote Address. In: Net.ObjectDAYS 2001 (2001)Google Scholar
  18. 18.
    Innovaive Approaches for Improving Information Supply, Gartner Group Report, M-14-3517 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Han-joon Kim
    • 1
  1. 1.Department of Electrical and Computer EngineeringUniversity of SeoulKorea

Personalised recommendations