Automatic Fax Routing

  • Paul Viola
  • James Rinker
  • Martin Law
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)

Abstract

We present a system for automatic FAX routing which processes incoming FAX images and forwards them to the correct email alias. The system first performs optical character recognition to find words and in some cases parts of words (we have observed error rates as high as 10 to 20 percent). For all these “noisy” words, a set of features is computed which include internal text features, location features, and relationship features. These features are combined to estimate the relevance of the word in the context of the page and the recipient database. The parameters of the word relevance function are learned from training data using the AdaBoost learning algorithm. Words are then compared to the database of recipients to find likely matches. The recipients are finally ranked by combining the quality of the matches and the relevance of the words. Experiments are presented which demonstrate the effectiveness of this system on a large set of real data.

References

  1. 1.
    Lii, J., Srihari, S.N.: Location of name and address on fax cover pages. In: International Conference on Document Analysis and Recognition, pp. 756–759 (1995)Google Scholar
  2. 2.
    Tupaj, S., Dediu, H., Alam, H.: Faxassist: an automatic routing of unconstrained fax to email location. In: SPLI Document Recognition and Retrieval VII (2000)Google Scholar
  3. 3.
    Likforman-Sulem, L., Vaillant, P., Yvon, F.: Proper names extraction from fax images combining textual and image features. In: International Conference on Document Analysis and Recognition, pp. 545–549 (2003)Google Scholar
  4. 4.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)MATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Tieu, K., Viola, P.: Boosting image retrieval. In: International Conference on Computer Vision (2000)Google Scholar
  6. 6.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2001)Google Scholar
  7. 7.
    ScanSoft: Scansoft optical character recognition sdk (2002)Google Scholar
  8. 8.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21, 168–173 (1974)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Paul Viola
    • 1
  • James Rinker
    • 1
  • Martin Law
    • 1
  1. 1.Microsoft ResearchRedmondUSA

Personalised recommendations