Advertisement

Efficient Discovery of New Information in Large Text Databases

  • R. B. Bradford
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3495)

Abstract

Intelligence analysts are often faced with large data collections within which information relevant to their interests may be very sparse. Existing mechanisms for searching such data collections present difficulties even when the specific nature of the information being sought is known. Finding unknown information using these mechanisms is very inefficient. This paper presents an approach to this problem, based on iterative application of the technique of latent semantic indexing. In this approach, the body of existing knowledge on the analytic topic of interest is itself used as a query in discovering new relevant information. Performance of the approach is demonstrated on a collection of one million documents. The approach is shown to be highly efficient at discovering new information.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Beck, L.: Improving Information Retrieval with Latent Semantic Indexing. In: Proceedings of the 51st Annual Meeting of the American Society for Information Science, vol. 25, pp. 36–40 (1988)Google Scholar
  2. 2.
    Dumais, S.T.: Latent Semantic Analysis. Annual Review of Information Science and Technology 38, 189–230 (2004)CrossRefGoogle Scholar
  3. 3.
    Zukas, A., Price, R.J.: Document Categorization Using Latent Semantic Indexing. In: Proceedings, Symposium on Document Image Understanding Technology, pp. 87–91 (2003)Google Scholar
  4. 4.
    Landauer, T.K., Laham, D., Foltz, P.: Learning Human-like Knowledge by Singular Value Decomposition: A Progress Report. In: Advances in Neural Information Processing Systems, vol. 10, pp. 45–51. MIT Press, Cambridge (1998)Google Scholar
  5. 5.
    Kontostathis, A., Pottenger, W.M.: A Mathematical View of Latent Semantic Indexing: Tracing Term Co-occurrences. Technical Report LU-CSE-02-006, Lehigh University (2002)Google Scholar
  6. 6.
    Landauer, T., Littman, M.: Fully Automatic Cross-language Document Retrieval Using Latent Semantic Indexing. In: Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, pp. 31–38 (1990)Google Scholar
  7. 7.
    Soboroff, I., Harmon, D.: Overview of the TREC 2003 Novelty Track. In: The 12th Text Retrieval Conference (TREC 2003), NIST Special Publication SP-500-255 (2003)Google Scholar
  8. 8.
    Bartell, B.T., Cottrell, G.W., Belew, R.K.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 161–167 (1992)Google Scholar
  9. 9.
    Ding, C.H.Q.: A Similarity-based Probability Model for Latent Semantic Indexing. In: Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–65 (1999)Google Scholar
  10. 10.
    Allan, J., Gupta, R., Khandelwal, V.: Temporal Summaries of News Topics. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 10–18 (2001)Google Scholar
  11. 11.
    Vats, N., Skillicorn, D.: The ATHENS System for Novel Information Discovery. Queens University External Technical Report, ISSN-0836-0227-2004-489, October 13 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • R. B. Bradford
    • 1
  1. 1.SAICReston

Personalised recommendations