Efficient Discovery of New Information in Large Text Databases

Bradford, R. B.

doi:10.1007/11427995_31

R. B. Bradford²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3495))

Included in the following conference series:

International Conference on Intelligence and Security Informatics

4035 Accesses
2 Citations

Abstract

Intelligence analysts are often faced with large data collections within which information relevant to their interests may be very sparse. Existing mechanisms for searching such data collections present difficulties even when the specific nature of the information being sought is known. Finding unknown information using these mechanisms is very inefficient. This paper presents an approach to this problem, based on iterative application of the technique of latent semantic indexing. In this approach, the body of existing knowledge on the analytic topic of interest is itself used as a query in discovering new relevant information. Performance of the approach is demonstrated on a collection of one million documents. The approach is shown to be highly efficient at discovering new information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Beck, L.: Improving Information Retrieval with Latent Semantic Indexing. In: Proceedings of the 51st Annual Meeting of the American Society for Information Science, vol. 25, pp. 36–40 (1988)
Google Scholar
Dumais, S.T.: Latent Semantic Analysis. Annual Review of Information Science and Technology 38, 189–230 (2004)
Article Google Scholar
Zukas, A., Price, R.J.: Document Categorization Using Latent Semantic Indexing. In: Proceedings, Symposium on Document Image Understanding Technology, pp. 87–91 (2003)
Google Scholar
Landauer, T.K., Laham, D., Foltz, P.: Learning Human-like Knowledge by Singular Value Decomposition: A Progress Report. In: Advances in Neural Information Processing Systems, vol. 10, pp. 45–51. MIT Press, Cambridge (1998)
Google Scholar
Kontostathis, A., Pottenger, W.M.: A Mathematical View of Latent Semantic Indexing: Tracing Term Co-occurrences. Technical Report LU-CSE-02-006, Lehigh University (2002)
Google Scholar
Landauer, T., Littman, M.: Fully Automatic Cross-language Document Retrieval Using Latent Semantic Indexing. In: Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, pp. 31–38 (1990)
Google Scholar
Soboroff, I., Harmon, D.: Overview of the TREC 2003 Novelty Track. In: The 12th Text Retrieval Conference (TREC 2003), NIST Special Publication SP-500-255 (2003)
Google Scholar
Bartell, B.T., Cottrell, G.W., Belew, R.K.: Latent Semantic Indexing is an Optimal Special Case of Multidimensional Scaling. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 161–167 (1992)
Google Scholar
Ding, C.H.Q.: A Similarity-based Probability Model for Latent Semantic Indexing. In: Proceedings of the 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–65 (1999)
Google Scholar
Allan, J., Gupta, R., Khandelwal, V.: Temporal Summaries of News Topics. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 10–18 (2001)
Google Scholar
Vats, N., Skillicorn, D.: The ATHENS System for Novel Information Discovery. Queens University External Technical Report, ISSN-0836-0227-2004-489, October 13 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

SAIC, Reston, VA
R. B. Bradford

Authors

R. B. Bradford
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Library and Information Science, Rutgers University,
Paul Kantor
School of Communication, Information and Library Studies, Rutgers University, 4 Huntington Street, 08901-1071, New Brunswick, NJ, USA
Gheorghe Muresan
Artificial Solutions, Altonaer Poststraße 13b, 22767, Hamburg, Germany
Fred Roberts
MIS Department, University of Arizona, 85721, Tucson, AZ, USA
Daniel D. Zeng
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Fei-Yue Wang
Department of Management Information Systems, Eller College of Management, The University of Arizona, 85721, AZ, USA
Hsinchun Chen
College of Computing, Georgia Tech Information Security Center, Georgia Institute of Technology, 801 Atlantic Drive, 30332-0280, Atlanta, GA, USA
Ralph C. Merkle

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bradford, R.B. (2005). Efficient Discovery of New Information in Large Text Databases. In: Kantor, P., et al. Intelligence and Security Informatics. ISI 2005. Lecture Notes in Computer Science, vol 3495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11427995_31

Download citation

DOI: https://doi.org/10.1007/11427995_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25999-2
Online ISBN: 978-3-540-32063-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics