A comparison of automatic and manual zoning

Taghva, Kazem; Borsack, Julie; Lumos, Steven; Condit, Allen

doi:10.1007/s10032-003-0116-x

A comparison of automatic and manual zoning

An information retrieval prospective

Published: April 2003

Volume 6, pages 230–235, (2003)
Cite this article

Document Analysis and Recognition Aims and scope Submit manuscript

Kazem Taghva¹,
Julie Borsack¹,
Steven Lumos¹ &
…
Allen Condit¹

64 Accesses
1 Citation
Explore all metrics

Abstract.

In this paper, we study the effects of automatic zoning on retrieval and ranking variability. We will show that OCR-generated text from automatic zoning, followed by postprocessing, produces retrieval results equivalent to OCR-generated text from manual zoning. We further show that there is a strong linear association between the ranked query results obtained from these two methods of zoning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison

Army ANT: A Workbench for Innovation in Entity-Oriented Search

HyKSS: Hybrid Keyword and Semantic Search

Article 07 November 2014

References

Autonomy Inc (1999) San Francisco, CA Autonomy Knowledge Server, 2.2.0 edn
Croft WB, Harding S, Taghva K, Borsack J (1994) An evaluation of information retrieval accuracy with simulated OCR output. In: Proceedings of the 3rd symposium on document analysis and information retrieval, Las Vegas, NV, April 1994, pp 115-126
Harman D (1992) Ranking algorithms. In: Frakes WB, Baeza-Yates R (eds) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ, pp 363-392
Hawking D (1996) Document retrieval in ocr-scanned text. In: Proceedings of the 6th parallel computing workshop, paper P2-F, Kawasaki, Japan, November 1996
Nartker T, Young R (2002) OCR accuracy produced by the current DOE document conversion system. Technical Report 2002-06, Information Science Research Institute, University of Nevada, Las Vegas
Salton G, McGill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York
Scansoft Inc (2000) Peabody, MA Recognition API manual, v10 edn
Science Applications International Corporation (1990) Capture station simulation lessons learned. Final report for the Licensing Support System prepared under contract DE-AC01-87RW00084 for the U.S. Department of Energy, Office of Civilian Radioactive Waste Management, Washington, DC
Singhal A, Salton G, Buckley C (1996) Length normalization in degraded text collections. In: Proceedings of the 5th annual symposium on document analysis and information retrieval, Las Vegas, NV, April 1996, pp 149-162
Taghva K, Borsack J, Condit A (1994) An expert system for automatically correcting OCR output. In: Proceedings of IS&T/SPIE 1994 international symposium on electronic imaging science and technology, San Jose, CA, February 1994, pp 270-278
Taghva K, Borsack J, Condit A (1994) Results of applying probabilistic IR to OCR text. In: Proceedings of the 17th international ACM/SIGIR conference on research and development in information retrieval, Dublin, Ireland, July 1994, pp 202-211
Taghva K, Borsack J, Condit A (1996) Effects of OCR errors on ranking and feedback using the vector space model. J Inf Process Manage 32(3):317-327
Google Scholar
Taghva K, Borsack J, Condit A (1996) Evaluation of model-based retrieval effectiveness with OCR text. ACM Trans Inf Sys 14(1):64-93
Google Scholar
Taghva K, Borsack J, Condit A, Erva S (1994) The effects of noisy data on text retrieval. J Am Soc Inf Sci 45(1):50-58
Google Scholar
Taghva K, Condit A, Borsack J, Kilburg J, Wu C, Gilbreth J (1998) The MANICURE document processing system. In: Proceedings of the IS&T/SPIE 1998 international symposium on electronic imaging science and technology, San Jose, CA, January 1998
Taghva K, Coombs J (2002) Hairetes: a search engine for OCR documents. In: Proceedings of Document Analysis Systems V: 5th international workshop, Princeton, NJ, August 2002. Lecture notes in computer science, vol 2423. Springer, Berlin Heidelberg New York, pp 412-422

Download references

Author information

Authors and Affiliations

Information Science Research Institute, P.O. Box 454021, NV 89154-4021, Las Vegas, USA
Kazem Taghva, Julie Borsack, Steven Lumos & Allen Condit

Authors

Kazem Taghva
View author publications
You can also search for this author in PubMed Google Scholar
Julie Borsack
View author publications
You can also search for this author in PubMed Google Scholar
Steven Lumos
View author publications
You can also search for this author in PubMed Google Scholar
Allen Condit
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Received: 17 July 2003, Accepted: 18 October 2003, Published online: 6 February 2004

Information Science Research Institute: e-mail isri@isri.unlv.edu

Rights and permissions

Reprints and permissions

About this article

Cite this article

Taghva, K., Borsack, J., Lumos, S. et al. A comparison of automatic and manual zoning. IJDAR 6, 230–235 (2003). https://doi.org/10.1007/s10032-003-0116-x

Download citation

Issue Date: April 2003
DOI: https://doi.org/10.1007/s10032-003-0116-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of automatic and manual zoning

Abstract.

Access this article

Similar content being viewed by others

ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison

Army ANT: A Workbench for Innovation in Entity-Oriented Search

HyKSS: Hybrid Keyword and Semantic Search

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A comparison of automatic and manual zoning

Abstract.

Access this article

Similar content being viewed by others

ranx: A Blazing-Fast Python Library for Ranking Evaluation and Comparison

Army ANT: A Workbench for Innovation in Entity-Oriented Search

HyKSS: Hybrid Keyword and Semantic Search

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation