Investigating the Potential of Rough Sets Theory in Automatic Thesaurus Construction

  • Gloria VirginiaEmail author
  • Hung Son Nguyen
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 157)


This paper presents the result of initial study about implementation of rough sets theory in generating a thesaurus automatically from a corpus. The main objective of this study is to investigate the relation between keywords (defined by human experts as highly related with particular topic) and the sets generated based on rough sets theory. Analysis was conducted into comparison results of all available sets. We concluded that implementing rough sets theory is a rational way to automatically construct a thesaurus, as it can enrich a concept and proved to be able to cover the keywords given by the human experts.


Information Retrieval Human Expert Query Expansion Document Cluster Topic Assignment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Crouch, C., Yang, B.: Experiments in automatic statistical thesaurus construction. In: Proc. The 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 77–88. ACM Publisher, New York (1992)CrossRefGoogle Scholar
  2. 2.
    Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. International Journal of Intelligent System 17, 199–212 (2002)zbMATHCrossRefGoogle Scholar
  3. 3.
    Imran, H., Sharan, A.: Thesaurus and query expansion. International Journal of Computer Science & Information Technology (IJCSIT) 1, 89–97 (2009)Google Scholar
  4. 4.
    Kawasaki, S., Nguyen, N.B., Ho, T.-B.: Hierarchical Document Clustering Based on Tolerance Rough Set Model. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough sets: a tutorial. In: Rough Fuzzy Hybridization: A New Trend in Decision-Making, pp. 3–98. Springer, Singapore (1998)Google Scholar
  6. 6.
    Lassila, O., McGuinness, D.: The role of frame-based representation on the semantic web. Technical Report KSL-01-02, Knowledge System Laboratory, Standford UniversityGoogle Scholar
  7. 7.
    Lee, H., Lin, S., Huang, C.: Interactive query expansion based on fuzzy association thesaurus for web information retrieval. In: Proc. of the 10th IEEE International Conference on Fuzzy Systems, vol. 3, pp. 724–727 (2001)Google Scholar
  8. 8.
    Maron, M.E., Kuhns, J.K.: On relevance, probabilistic indexing and information retrieval. Journal of the ACM 7, 216–244 (1960), doi:10.1145/321033.321035CrossRefGoogle Scholar
  9. 9.
    Nguyen, H.S., Ho, T.B.: Rough document clustering and the Internet. In: Pedrycz, W., Skowron, A., Kreinovich, V. (eds.) Handbook of Granular Computing, ch. 47, pp. 987–1003. John Wiley & Sons Ltd. (2008), doi:10.1002/9780470724163Google Scholar
  10. 10.
    Patry, A., Langlais, P.: Corpus-based terminology extraction. In: 7th International Conference on Terminology and Knowledge Engineering (TKE 2005), pp. 313–321 (2005)Google Scholar
  11. 11.
    Pawlak, Z.: Rough sets. International Journal of Computer and Information Science 11, 341–356 (1982)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Pawlak, Z.: Some Issues on Rough Sets. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B.z., Świniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 1–58. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  13. 13.
    Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inf. 27, 245–253 (1996)MathSciNetzbMATHGoogle Scholar
  14. 14.
    Vega, V.B.: Information retrieval for the indonesian language. Master thesis. National University of Singapore (2001) (unpublished)Google Scholar
  15. 15.
    Virginia, G., Nguyen, H.S.: Automatic ontology constructor for Indonesian language. In: Proc. 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT 2010), pp. 440–443. IEEE Press (2010), doi:10.1109/WI-IAT.2010.122Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Faculty of Mathematics, Informatics and MechanicsUniversity of WarsawWarsawPoland

Personalised recommendations