Hardware Support for Language Aware Information Mining
Information retrieval from text or ‘text mining’ is the process of extracting interesting and non-trivial knowledge from unstructured text. With the ever increasing amounts of information stored on the web or archived within a computing system, high performance data processing architectures are required to process this data in real time. The aim of the work presented in this paper is the development of a hardware text mining IP-Core for use in FPGA based systems. In this paper we will describe the pre-processing engine we have developed for the PRESENCE II PCI card, to accelerate the identification of significant words within a document, logging their frequency and position. The performance of this system is then compared to an equivalent software implementation using the Lucene software package.
KeywordsField Programmable Gate Array Hash Table Pipeline Stage Word Boundary Java Virtual Machine
Unable to display preview. Download preview PDF.
- Freeman, M.J., Weeks, M., Austin, J.: Hardware implementation of Similarity Functions. In: IADIS International Conference on Applied Computing, Algarve, Portugal (2005)Google Scholar
- Sholom, M.W., Naval, V.K.: A System for Real-time Competitive Market Intelligence (2002), WWW: http://www.research.ibm.com/dar/papers/pdf/weiss_kdd2002_mi.pdf
- Sturgeon, W.: Interview: Mike Lynch, founder of Autonomy on Google, penguins and the future of search (2005), WWW: http://software.silicon.com/applications.0,39024653,39152405,00.html
- Cutting, D., et al.: The Lucene search engine (2005), WWW: http://lucene.apache.org
- Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal (April 1958)Google Scholar
- van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)Google Scholar
- Baeza-Yates, R., Ribiero-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)Google Scholar
- ACAG: AURA - Research into high-performance pattern matching systems (2002), WWW: http://www.cs.york.ac.uk/aura
- Cybula (2005), WWW: http://www.cybula.com
- Porter, M.F.: An Algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar