Advertisement

Hardware Support for Language Aware Information Mining

  • Michael Freeman
  • Thimal Jayasooriya
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4253)

Abstract

Information retrieval from text or ‘text mining’ is the process of extracting interesting and non-trivial knowledge from unstructured text. With the ever increasing amounts of information stored on the web or archived within a computing system, high performance data processing architectures are required to process this data in real time. The aim of the work presented in this paper is the development of a hardware text mining IP-Core for use in FPGA based systems. In this paper we will describe the pre-processing engine we have developed for the PRESENCE II PCI card, to accelerate the identification of significant words within a document, logging their frequency and position. The performance of this system is then compared to an equivalent software implementation using the Lucene software package.

Keywords

Field Programmable Gate Array Hash Table Pipeline Stage Word Boundary Java Virtual Machine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Freeman, M.J., Weeks, M., Austin, J.: Hardware implementation of Similarity Functions. In: IADIS International Conference on Applied Computing, Algarve, Portugal (2005)Google Scholar
  2. Sholom, M.W., Naval, V.K.: A System for Real-time Competitive Market Intelligence (2002), WWW: http://www.research.ibm.com/dar/papers/pdf/weiss_kdd2002_mi.pdf
  3. Sturgeon, W.: Interview: Mike Lynch, founder of Autonomy on Google, penguins and the future of search (2005), WWW: http://software.silicon.com/applications.0,39024653,39152405,00.html
  4. Cutting, D., et al.: The Lucene search engine (2005), WWW: http://lucene.apache.org
  5. Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal (April 1958)Google Scholar
  6. van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)Google Scholar
  7. Baeza-Yates, R., Ribiero-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
  8. Wang, L., Xiuju, F.: Data mining with computational intelligence. Springer, Heidelberg (2005)MATHGoogle Scholar
  9. ACAG: AURA - Research into high-performance pattern matching systems (2002), WWW: http://www.cs.york.ac.uk/aura
  10. Cybula (2005), WWW: http://www.cybula.com
  11. Chowdhury, D.R., Gupta, I.S., Chaudhuri, P.P.: A low cost high capacity associative memory design using cellular automata. IEEE Transactions on computers 44(10), 1260–1264 (1995)MATHCrossRefGoogle Scholar
  12. Porter, M.F.: An Algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Michael Freeman
    • 1
  • Thimal Jayasooriya
    • 1
  1. 1.Department of Computer ScienceUniversity of YorkUK

Personalised recommendations