Internet Document Filtering Using Fourier Domain Scoring
Most search engines return a lot of unwanted information. A more thorough filtering process can be performed on this information to sort out the relevant documents. A new method called Frequency Domain Scoring (FDS), which is based on the Fourier Transform is proposed. FDS performs the filtering by examining the locality of the keywords throughout the documents. This is examined and compared to the well known techniques Latent Semantic Indexing (LSI) and Cosine measure. We found that FDS obtains better results of how relevant the document is to the query. The other two methods (cosine measure, LSI) do not perform as well mainly because they need a wider variety of documents to determine the topic.
KeywordsSearch Engine Relevant Document Average Precision Latent Semantic Indexing Document Vector
- 1.M. W. Berry, S. T. Dumais, and G. W. O’Brien. Using linear algebra for intelligent information retrieval. Technical report, Computer Science Department, The University of Tennessee, Knoxville, TN, December 1994.Google Scholar
- 2.Michael W. Berry. Large scale sparse singular value computations. Department of Computer Science, University of Tennessee, 107 Ayres Hall, Knoxville, 1993.Google Scholar
- 4.Susan T. Dumais. Improving the retrieval of information from external sources. Behaviour Research Methods, Instruments & Computers, 23(2):229–236, 1991.Google Scholar
- 5.Adele E. Howe and Daniel Dreilinger. Savvysearch. AI Magazine, pages 19–25, Summer 1997.Google Scholar
- 6.Dunja Mladenić. Personal webwatcher: design and implementation. Technical report, Dept. for Intelligent Systems, J. Stefan Institute, Jamova 39, 11000 Ljubljana, Slovania, 1996.Google Scholar
- 8.John G. Proakis and Dimitris G. Manolakis. Digital signal processing: principles, algorithms, and applications. Prentice-Hall, Inc, 3rd edition, 1996.Google Scholar
- 9.Yousef Saad. Iterative methods for sparse linear systems. PWS series in computer science. PWS Pub. Co., Boston, 1996.Google Scholar
- 11.National Institute Of Standards and Technology. Text retrieval conference (trec) http://trec.nist.gov/. World Wide Web, 2001.
- 12.Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing gigabytes: compressing and indexing documents and images. Morgan Kaufmann Publishers, 1999.Google Scholar