A Semantic Text Retrieval for Indonesian Using Tolerance Rough Sets Models

  • Gloria VirginiaEmail author
  • Hung Son Nguyen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8988)


The research of Tolerance Rough Sets Model (TRSM) ever conducted acted in accordance with the rational approach of AI perspective. This article presented studies who complied with the contrary path, i.e. a cognitive approach, for an objective of a modular framework of semantic text retrieval system based on TRSM specifically for Indonesian. In addition to the proposed framework, this article proposes three methods based on TRSM, which are the automatic tolerance value generator, thesaurus optimization, and lexicon-based document representation. All methods were developed by the use of our own corpus, namely ICL-corpus, and evaluated by employing an available Indonesian corpus, called Kompas-corpus. The endeavor of a semantic information retrieval system is the effort to retrieve information and not merely terms with similar meaning. This article is a baby step toward the objective.


Information retrieval Tolerance rough sets model Text mining 



This work is partially supported by (1) Specific Grant Agreement Number-2008-4950/001-001-MUN-EWC from European Union Erasmus Mundus “External Cooperation Window” EMMA, (2) the National Centre for Research and Development (NCBiR) under Grant No. SP/I/1/77065/10 by the Strategic Scientific Research and Experimental Development Program: “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”, (3) grant from Ministry of Science and Higher Education of the Republic of Poland N N516 077837, and (4) grant from Yayasan Arsari Djojohadikusumo (YAD) based on Addendum Agreement No. 029/C10/UKDW/2012. We thank Faculty of Computer Science, University of Indonesia, for the permission of using the CS stemmer.

Supplementary material


  1. 1.
    Büttcher, S., Clarke, C.L.A., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engine. MIT Press, Cambridge (2010)Google Scholar
  2. 2.
    Weiss, S.M., Indurkhya, N., Zhang, T., Damerau, F.J.: Text Mining - Predictive Methods for Analyzing Unstructured Information. Springer, New York (2005)Google Scholar
  3. 3.
    Eifring, H., Theil, R.: Linguistics for Students of Asian and African Languages (2005)Google Scholar
  4. 4.
    Grandy, R.E., Warner, R.: Paul grice., May 2006. Accessed 02 Oct 2012
  5. 5.
    Searle, J.R.: Intentionality: An Essay in the Philosophy of Mind. Cambridge University Press, Cambridge (1983)CrossRefGoogle Scholar
  6. 6.
    Grice, H.P.: Studies in the Way of Words. Harvard University Press, Cambridge (1989)Google Scholar
  7. 7.
    Haugh, M., Jaszczolt, K.M.: Speaker intentions and intentionality. In: Allan, K., Jaszczolt, K.M. (eds.) The Cambridge Handbook of Pragmatics, pp. 87–112. Cambridge University Press, Cambridge (2012)CrossRefGoogle Scholar
  8. 8.
    Akand, M.: Grice and searle on meaning. Copula - J. Philos. Dept XXVIII, 51–58 (2011)Google Scholar
  9. 9.
    Adriani, M., Manurung, R.: A survey of bahasa Indonesia NLP research conducted at the University of Indonesia. In: Proceedings of the 2nd International MALINDO Workshop (2008)Google Scholar
  10. 10.
    Asian, J.: Effective techniques for Indonesian text retrieval. Ph.D. thesis, School of Computer Science and Information Technology, RMIT University, Doctor of Philosophy Thesis (March 2007)Google Scholar
  11. 11.
    Asian, J., Williams, H.E., Tahaghoghi, S.M.M.: A testbed for Indonesian text retrieval. In: Bruza, P., Moffat, A., Turpin, A. (eds.) ADCS, pp. 55–58. University of Melbourne, Department of Computer Science (2004)Google Scholar
  12. 12.
    Sneddon, J.: The Indonesian Language: It’s History and Role in Modern Society. UNSW Press, Sydney (2003)Google Scholar
  13. 13.
    Kawasaki, S., Nguyen, N.B., Ho, T.-B.: Hierarchical document clustering based on tolerance rough set model. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 458–463. Springer, Heidelberg (2000) CrossRefGoogle Scholar
  14. 14.
    Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. Int. J. Intell. Syst. 17(2), 199–212 (2002)CrossRefGoogle Scholar
  15. 15.
    Nguyen, H.S., Ho, T.B.: Rough document clustering and the internet. In: Handbook of Granular Computing, pp. 987–1003. Wiley, Hoboken (2008)Google Scholar
  16. 16.
    Wu, Y., Ding, Y., Wang, X., Xu, J.: On-line hot topic recommendation using tolerance rough set based topic clustering. J. Comput. 5, 549–556 (2010)Google Scholar
  17. 17.
    Gaoxiang, Y., Heping, H., Zhengding, L., Ruixuan, L.: A novel web query automatic expansion based on rough set. Wuhan Univ. J. Nat. Sci. 11(5), 1167–1171 (2006)CrossRefGoogle Scholar
  18. 18.
    Bly, B.M., Rumelhart, D.E. (eds.): Cognitive Science: Handbook of Perception and Cognition, 2nd edn. Academic Press, Millbrae (1999)Google Scholar
  19. 19.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Pearson Education Inc., Upper Saddle River (2010) Google Scholar
  20. 20.
    Voorhees, E.M., Harman, D.: Overview of the ninth text retrieval conference (TREC-9). In: Proceedings of the Ninth Text Retrieval Conference (TREC-9), National Institute of Standards and Technology (NIST), pp. 1–14 (2000)Google Scholar
  21. 21.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1999)Google Scholar
  22. 22.
    Chomsky, N.: Language and Mind, 3rd edn. Cambridge University Press, New York (2006)CrossRefGoogle Scholar
  23. 23.
    Furnas, G.W., Deerwester, S., Dumais, S.T., Landauer, T.K., Harshman, R.A., Streeter, L.A., Lochbaum, K.E.: Information retrieval using a singular value decomposition model of latent semantic structure. In: Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 1988, New York, NY, USA, pp. 465–480. ACM (1988)Google Scholar
  24. 24.
    Grossman, D.A., Frieder, O.: Information Retrieval: Algorithms and Heuristics, 2nd edn. Springer, Netherlands (2004)CrossRefGoogle Scholar
  25. 25.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial intelligence. IJCAI 2007, San Francisco, CA, USA, pp. 1606–1611. Morgan Kaufmann Publishers Inc (2007)Google Scholar
  26. 26.
    Gottron, T., Anderka, M., Stein, B.: Insights into explicit semantic analysis. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management. CIKM 2011, New York, NY, USA, pp. 1961–1964. ACM (2011)Google Scholar
  27. 27.
    Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector spaces model in information retrieval. In: Proceedings of the 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR 1985, New York, NY, USA, pp. 18–25. ACM (1985)Google Scholar
  28. 28.
    Nguyen, S.H., Świeboda, W., Jaśkiewicz, G.: Extended document representation for search result clustering. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 77–95. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  29. 29.
    Nguyen, S.H., Jaśkiewicz, G., Świeboda, W., Nguyen, H.S.: Enhancing search result clustering with semantic indexing. In: Proceedings of the Third Symposium on Information and Communication Technology. SoICT 2012, New York, NY, USA, pp. 71–80. ACM (2012)Google Scholar
  30. 30.
    Szczuka, M., Janusz, A., Herba, K.: Semantic clustering of scientific articles with use of DBpedia knowledge base. In: Bembenik, R., Skonieczny, L., Rybiński, H., Niezgodka, M. (eds.) Intelligent Tools for Building a Scient. Info. Plat. SCI, vol. 390, pp. 61–76. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  31. 31.
    Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Komorowski, J., Pawlak, Z., Polkowski, L., Skowron, A.: Rough Sets: A Tutorial, pp. 3–98. Springer, Singapore (1998) Google Scholar
  33. 33.
    Pawlak, Z.: Some issues on rough sets. In: Peters, J.F., Skowron, A., Grzymała-Busse, J.W., Kostek, B., Swiniarski, R.W., Szczuka, M.S. (eds.) Transactions on Rough Sets I. LNCS, vol. 3100, pp. 1–58. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  34. 34.
    Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inf. 27, 245–253 (1996)MathSciNetGoogle Scholar
  35. 35.
    Lassila, O., Mcguinness, D.: The role of frame-based representation on the semantic web. Technical report, Knowledge System Laboratory, Standford University (2001)Google Scholar
  36. 36.
    Virginia, G., Nguyen, H.S.: Lexicon-based document representation. Fundamenta Informaticae 124, 27–45 (2013, to appear)Google Scholar
  37. 37.
    Vega, V.B.: Information retrieval for the Indonesian language. Master’s thesis, National University of Singapore, Unpublished (2001)Google Scholar
  38. 38.
    Adriani, M., Asian, J., Nazief, B., Tahaghoghi, S.M.M., Williams, H.E.: Stemming indonesian: a confix-stripping approach. ACM Trans. Asian Lang. Inf. Process. 6, 1–33 (2007)CrossRefGoogle Scholar
  39. 39.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefGoogle Scholar
  40. 40.
    McCandless, M., Hatcher, E., Gospodnetić, O.: Lucene in Action. Manning Publications Co., Greenwich (2010)Google Scholar
  41. 41.
    Virginia, G., Nguyen, H.S.: An algorithm for tolerance value generator in tolerance rough sets model. In: Na, M.G., Toro, C., Posada, J., Howlett, R.J., Jain, L.C. (eds.) Advances in Knowledge-Based and Intelligent Information and Engineering Systems. KES 2012, Netherlands, pp. 595–604. IOS Press (2012)Google Scholar
  42. 42.
    Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)Google Scholar
  43. 43.
    Adriani, M., Nazief, B.: Confix-Stripping: Approach to Stemming Algorithm for Bahasa Indonesia. Internal Publication, Depok (1996)Google Scholar
  44. 44.
    Obadi, G., Dráždilová, P., Hlaváček, L., Martinovič, J., Snášel, V.: A tolerance rough set based overlapping clustering for the DBLP data. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops. WI-IAT 2010, vol. 3, pp. 57–60. IEEE (2010)Google Scholar
  45. 45.
    Troester, M.: Big data meets big data analytics. (2012). SAS Institute Inc. Accessed 22 Feb 2013
  46. 46.
    Ingwersen, P.: Information Retrieval Interaction, 1st edn. Taylor Graham, London (1992)Google Scholar
  47. 47.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRefGoogle Scholar
  48. 48.
    Manola, F., Miller, E.: Rdf primer. (2004). W3C. Accessed 12 Jan 2013

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Informatics Engineering DepartmentDuta Wacana Christian UniversityYogyakartaIndonesia
  2. 2.Institute of Mathematics, University of WarsawWarsawPoland

Personalised recommendations