Next Generation Information Technologies and Systems

Volume 5831 of the series Lecture Notes in Computer Science pp 1-1

Searching in the “Real World”

(Abstract of Invited Plenary Talk)
  • Ophir FriederAffiliated withInformation Retrieval Laboratory, Department of Computer Science, Illinois Institute of Technology

* Final gross prices may vary according to local VAT.

Get Access


For many, "searching" is considered a mostly solved problem.  In fact, for text processing, this belief is factually based.  The problem is that most "real world" search applications involve "complex documents", and such applications are far from solved.  Complex documents, or less formally, "real world documents", comprise of a mixture of images, text, signatures, tables, logos, water-marks, stamps, etc, and are often available only in scanned hardcopy formats. Search systems for such document collections are currently unavailable.

We describe our efforts at building a complex document information processing (CDIP) prototype. This prototype integrates "point solution" (mature) technologies, such as OCR capability, signature matching and handwritten word spotting techniques, search and mining approaches, among others, to yield a system capable of searching "real world documents". The described prototype demonstrates the adage that "the whole is greater than the sum of its parts".