Search in Documents Based on Topical Development

  • Jan Martinovič
  • Václav Snášel
  • Jiří Dvorský
  • Pavla Dráždilová
Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 67)


An important service for systems providing access to information is the organization of returned search results. Vector model search results may be represented by a sphere in an n-dimensional space. A query represents the center of this sphere whose size is determined by its radius or by the amount of documents it contains. The goal of searching is to have all documents relevant to a query present within this sphere. It is known that not all relevant documents are present in this sphere and that is why various methods for improving search results, which can be implemented on the basis of expanding the original question, have been developed. Our goal is to utilize knowledge of document similarity contained in textual databases to obtain a larger amount of relevant documents while minimizing those cancelled due to their irrelevance. In the article we will define the concept k-path (topical development). For the individual development of vector query results, we will propose the SORT-EACH algorithm, which uses the aforementioned methods for acquiring topical development.


Topical Development Clustering Information Retrieval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Armstrong, M.A.: Basic Topology (Undergraduate Texts in Mathematics). Springer, Heidelberg (1997)Google Scholar
  2. 2.
    Berry, M.: Survey of Text Mining: Clustering, Classification, and Retrieval. Springer, Heidelberg (2003)Google Scholar
  3. 3.
    Carpineto, C., de Mori, R., Romano, G., Bigi, B.: An information-theoretic approach to automatic query expansion. ACM Transactions on Information Systems 19(1), 1–27 (2001), CrossRefGoogle Scholar
  4. 4.
    Chalmers, M., Chitson, P.: Bead: explorations in information visualization. In: SIGIR 1992: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 330–337. ACM, New York (1992), CrossRefGoogle Scholar
  5. 5.
    Dvorský, J., Martinovič, J., Snášel, V.: Query expansion and evolution of topic in information retrieval systems. In: DATESO, pp. 117–127. Desná – Černá Říčka, Czech Republic (2004)Google Scholar
  6. 6.
    Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability. SIAM, Philadelphia (2007)zbMATHGoogle Scholar
  7. 7.
    Hearst, M.A.: Tilebars: Visualization of term distribution information in full text information access. In: Proceedings of the Conference on Human Factors in Computing Systems, CHI 1995 (1995),
  8. 8.
    Ishioka, T.: Evaluation of criteria on information retrieval. Systems and Computers in Japan 35(6), 42–49 (2004), CrossRefGoogle Scholar
  9. 9.
    Jacobs, D.W., Weinshall, D., Gdalyahu, Y.: Classification with nonmetric distances: image retrieval and class representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(6), 583–600 (2000), CrossRefGoogle Scholar
  10. 10.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999), CrossRefGoogle Scholar
  11. 11.
    Korfhage, R.R.: To see, or not to see - is that the query? In: SIGIR 1991: Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 134–141. ACM Press, New York (1991), CrossRefGoogle Scholar
  12. 12.
    Kowalski, G.J., Maybury, M.T.: Information Storage and Retrieval Systems Theory and Implementation, 2nd edn. The Information Retrieval Series, vol. 8. Springer, Norwell (2000)Google Scholar
  13. 13.
    Leuski, A.: Evaluating document clustering for interactive information retrieval. In: CIKM, pp. 33–40 (2001),
  14. 14.
    Martinovič, J., Gajdoš, P., Snášel, V.: Similarity in information retrieval. In: 7th Computer Information Systems and Industrial Management Applications, 2008. CISIM 2008, pp. 145–150. IEEE, Los Alamitos (2008)CrossRefGoogle Scholar
  15. 15.
    Martinovič, J.: Evolution of topic in information retrieval systems. In: WOFEX, Ostrava, Czech Republic (2004)Google Scholar
  16. 16.
    Martinovič, J., Gajdoš, P.: Vector model improvement by FCA and topic evolution. In: DATESO, pp. 46–57. Desná – Černá Říčka, Czech Republic (2005)Google Scholar
  17. 17.
    Osinski, S., Weiss, D.: Carrot2: Design of a flexible and efficient web information retrieval framework. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 439–444. Springer, Heidelberg (2005)Google Scholar
  18. 18.
    Salton, G.: Automatic Text Processing. Addison-Wesley, Reading (1989)Google Scholar
  19. 19.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24(5), 513–523 (1988), Scholar
  20. 20.
    Spoerri, A.: Infocrystal: a visual tool for information retrieval & management. In: CIKM 1993: Proceedings of the second international conference on Information and knowledge management, pp. 11–20. ACM, New York (1993), CrossRefGoogle Scholar
  21. 21.
    Thompson, R.H., Croft, W.B.: Support for browsing in an intelligent text retrieval system. Int. J. Man-Mach. Stud. 30(6), 639–668 (1989)Google Scholar
  22. 22.
  23. 23.
  24. 24.
    Van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Department of Computer Science, University of Glasgow (1979)Google Scholar
  25. 25.
    Zamir, O., Etzioni, O.: Grouper: a dynamic clustering interface to web search results. Computer Networks 31(11-16), 1361–1374 (1999), Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jan Martinovič
    • 1
  • Václav Snášel
    • 1
  • Jiří Dvorský
    • 1
  • Pavla Dráždilová
    • 1
  1. 1.Department of Computer ScienceVŠB - Technical University of OstravaOstrava-PorubaCzech Republic

Personalised recommendations