Advertisement

Internet Document Filtering Using Fourier Domain Scoring

  • Laurence A. F. Park
  • Marimuthu Palaniswami
  • Ramamohanarao Kotagiri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2168)

Abstract

Most search engines return a lot of unwanted information. A more thorough filtering process can be performed on this information to sort out the relevant documents. A new method called Frequency Domain Scoring (FDS), which is based on the Fourier Transform is proposed. FDS performs the filtering by examining the locality of the keywords throughout the documents. This is examined and compared to the well known techniques Latent Semantic Indexing (LSI) and Cosine measure. We found that FDS obtains better results of how relevant the document is to the query. The other two methods (cosine measure, LSI) do not perform as well mainly because they need a wider variety of documents to determine the topic.

Keywords

Search Engine Relevant Document Average Precision Latent Semantic Indexing Document Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    M. W. Berry, S. T. Dumais, and G. W. O’Brien. Using linear algebra for intelligent information retrieval. Technical report, Computer Science Department, The University of Tennessee, Knoxville, TN, December 1994.Google Scholar
  2. 2.
    Michael W. Berry. Large scale sparse singular value computations. Department of Computer Science, University of Tennessee, 107 Ayres Hall, Knoxville, 1993.Google Scholar
  3. 3.
    S. Jeromy Carrière and Rick Kazman. Webquery: searching and visualising the web through connectivity. Computer Networks and ISDN Systems, 29:1257–1267, 1997.CrossRefGoogle Scholar
  4. 4.
    Susan T. Dumais. Improving the retrieval of information from external sources. Behaviour Research Methods, Instruments & Computers, 23(2):229–236, 1991.Google Scholar
  5. 5.
    Adele E. Howe and Daniel Dreilinger. Savvysearch. AI Magazine, pages 19–25, Summer 1997.Google Scholar
  6. 6.
    Dunja Mladenić. Personal webwatcher: design and implementation. Technical report, Dept. for Intelligent Systems, J. Stefan Institute, Jamova 39, 11000 Ljubljana, Slovania, 1996.Google Scholar
  7. 7.
    Daniel Siaw Weng Ngu and Xindong Wu. Site helper: a localised agent that helps incremental exploration of the world wide web. Computer Networks and ISDN Systems, 29:1249–1255, 1997.CrossRefGoogle Scholar
  8. 8.
    John G. Proakis and Dimitris G. Manolakis. Digital signal processing: principles, algorithms, and applications. Prentice-Hall, Inc, 3rd edition, 1996.Google Scholar
  9. 9.
    Yousef Saad. Iterative methods for sparse linear systems. PWS series in computer science. PWS Pub. Co., Boston, 1996.Google Scholar
  10. 10.
    Ellen Spertus. Parasite: mining structural information on the web. Computer Networks and ISDN Systems, 29:1205–1215, 1997.CrossRefGoogle Scholar
  11. 11.
    National Institute Of Standards and Technology. Text retrieval conference (trec) http://trec.nist.gov/. World Wide Web, 2001.
  12. 12.
    Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishers, 1999.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Laurence A. F. Park
    • 1
  • Marimuthu Palaniswami
    • 1
  • Ramamohanarao Kotagiri
    • 1
  1. 1.ARC Special Research Centre for Ultra-Broadband Information Networks, Department of Electrical & Elecronic EngineeringThe University of MelbourneVictoriaAustralia

Personalised recommendations