Weight Learning in TRSM-based Information Retrieval

Part of the Studies in Computational Intelligence book series (SCI, volume 541)


This chapter presents a novel approach to keyword search in Information Retrieval based on Tolerance Rough Set Model (TRSM). Bag-of-word representation of each document is extended by additional words that are enclosed into inverted index along with appropriate weights. Those extension words are derived from different techniques (e.g. semantic information, word distribution, etc.) that are encapsulated in the model by a tolerance relation. Weight for structural extension are then assigned by unsupervised algorithm. This method, called TRSM-WL, allow us to improve retrieval effectiveness by returning documents that not necessarily include words from the query. We compare performance of these two algorithms in the keyword search problem over a benchmark data set.


Information retrieval Rough sets Document expansion 


  1. 1.
    Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)Google Scholar
  2. 2.
    Kawasaki, S., Nguyen, N.B., Ho, T.B.: Hierarchical document clustering based on tolerance rough set model. In: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery. PKDD ‘00, pp. 458–463. Springer, London, UK (2000)Google Scholar
  3. 3.
    Ho, T.B., Nguyen, N.B.: Nonhierarchical document clustering based on a tolerance rough set model. Int. J. Intell. Syst. 17, 199–212 (2002)CrossRefMATHGoogle Scholar
  4. 4.
    Blair, D.C., Maron, M.E.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28(3), 289–299 (1985)CrossRefGoogle Scholar
  5. 5.
    Furnas, G.W., Landauer, T.K., Gomez, L.M., Dumais, S.T.: The vocabulary problem in human-system communication. Commun. ACM 30(11), 964–971 (1987)CrossRefGoogle Scholar
  6. 6.
    Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comput. Surv. 44(1), 1:1–1:50 (2012)Google Scholar
  7. 7.
    Manning, C.D., Raghavan, P., Schtze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATHGoogle Scholar
  8. 8.
    Voorhees, E.M.: The cluster hypothesis revisited. In: Proceedings of the 8th Annual International SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘85, pp. 188–196. ACM, New York, NY, USA (1985)Google Scholar
  9. 9.
    Leuski, A.: Evaluating document clustering for interactive information retrieval. In: Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ‘01, pp. 33–40. ACM, New York, NY, USA (2001)Google Scholar
  10. 10.
    Agirre, E., Arregi, X., Otegi, A.: Document expansion based on wordnet for robust IR. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters. COLING ‘10, Association for Computational Linguistics, pp. 9–17. Stroudsburg, PA, USA (2010)Google Scholar
  11. 11.
    Świeboda, W., Meina, M., Nguyen, H.S.: Weight learning for document tolerance rough set model. In: Lingras, P., Wolski, M., Cornelis, C., Mitra, S., Wasilewski, P. (eds.) Eight International Conference on RSKT. Lecture Notes in Computer Science, vol. 8171, pp. 385–396. Springer, Berlin (2013)Google Scholar
  12. 12.
    Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundam. Inform. 27(2/3), 245–253 (1996)MATHMathSciNetGoogle Scholar
  13. 13.
    Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, New York (2006)CrossRefGoogle Scholar
  14. 14.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  15. 15.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1606–1611, (2007)Google Scholar
  16. 16.
    Janusz, A., Slezak, D., Nguyen, H.S.: Unsupervised similarity learning from textual data. Fundam. Inform. 119(3–4), 319–336 (2012)MATHMathSciNetGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  1. 1.Institute of MathematicsThe University of WarsawWarsawPoland
  2. 2.Faculty of Mathematics and Computer ScienceNicolaus Copernicus UniversityToruńPoland

Personalised recommendations