Advertisement

Detecting figures and part labels in patents: competition-based development of graphics recognition algorithms

  • Christoph Riedl
  • Richard Zanibbi
  • Marti A. Hearst
  • Siyu Zhu
  • Michael Menietti
  • Jason Crusan
  • Ivan Metelsky
  • Karim R. Lakhani
Original Paper

Abstract

Most United States Patent and Trademark Office (USPTO) patent documents contain drawing pages which describe inventions graphically. By convention and by rule, these drawings contain figures and parts that are annotated with numbered labels but not with text. As a result, readers must scan the document to find the description of a given part label. To make progress toward automatic creation of ‘tool-tips’ and hyperlinks from part labels to their associated descriptions, the USPTO hosted a monthlong online competition in which participants developed algorithms to detect figures and diagram part labels. The challenge drew 232 teams of two, of which 70 teams (30 %) submitted solutions. An unusual feature was that each patent was represented by a 300-dpi page scan along with an HTML file containing patent text, allowing integration of text processing and graphics recognition in participant algorithms. The design and performance of the top-5 systems are presented along with a system developed after the competition, illustrating that the winning teams produced near state-of-the-art results under strict time and computation constraints. The first place system used the provided HTML text, obtaining a harmonic mean of recall and precision (F-measure) of 88.57 % for figure region detection, 78.81 % for figure regions with correctly recognized figure titles, and 70.98 % for part label detection and recognition. Data and source code for the top-5 systems are available through the online UCI Machine Learning Repository to support follow-on work by others in the document recognition community.

Keywords

Graphics recognition Text detection Optical character recognition (OCR) Competitions Crowdsourcing 

Notes

Acknowledgments

We are grateful for helpful comments provided by Ahmad Ahmad and the anonymous reviewers. This research was supported in part by the NASA Tournament Laboratory and the United States Patent and Trademark Office (USPTO).

References

  1. 1.
    Archak, N.: Money, glory and cheap talk: analyzing strategic behavior of contestants in simultaneous crowdsourcing contests on topcoder.com. In: Proceeding of the International Conference World Wide Web, pp. 21–30 (2010)Google Scholar
  2. 2.
    Barney Smith, E., Belaid, A., Kise, K. (eds.): Proceedings of the International Conference Document Analysis and Recognition. IEEE Computer Society, Washington, DC (2013)Google Scholar
  3. 3.
    Bhatti, N., Hanbury, A.: Image search in patents: a review. Int. J. Doc. Anal. Recognit. 16(4), 309–329 (2013)CrossRefGoogle Scholar
  4. 4.
    Blumenstein, M., Pal, U., Uchida, S. (eds.): Proceedings of the International Work. Document Analysis Systems. IEEE Computer Society, Gold Coast, Australia (2012)Google Scholar
  5. 5.
    Boudreau, K.J., Lacetera, N., Lakhani, K.R.: Incentives and problem uncertainty in innovation contests: an empirical analysis. Manag. Sci. 57(5), 843–863 (2011)CrossRefGoogle Scholar
  6. 6.
    Boudreau, K.J., Lakhani, K.R.: Using the crowd as an innovation partner. Harv. Bus. Rev. 91(4), 61–69 (2013)Google Scholar
  7. 7.
    Bukhari, S.S., Shafait, F., Breuel, T.M.: Coupled snakelets for curled text-line segmentation from warped document images. Int. J. Doc. Anal. Recognit. 16(1), 33–53 (2013)CrossRefGoogle Scholar
  8. 8.
    Casey, R., Lecolinet, E.: Strategies in character segmentation: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)CrossRefGoogle Scholar
  9. 9.
    Chan, K.F., Yeung, D.Y.: Error detection, error correction and performance evaluation in on-line mathematical expression recognition. Pattern Recognit. 34(8), 1671–1684 (2001)CrossRefzbMATHGoogle Scholar
  10. 10.
    Coates, A., Carpenter, B., Case, C., Satheesh, S., Suresh, B., Wang, T., Wu, D., Ng, A.: Text detection and character recognition in scene images with unsupervised feature learning. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 440–445. Beijing, China (2011)Google Scholar
  11. 11.
    Do, T.H., Tabbone, S., Ramos-Terrades, O.: Text/graphic separation using a sparse representation with multi-learned dictionaries. In: Proceedings of the International Conference Pattern Recognition, pp. 689–692. Tsukuba, Japan (2012)Google Scholar
  12. 12.
    Doermann, D., Tombre, K. (eds.): Handbook of Document Image Processing and Recognition, vol. 2. Springer, New York (2014)zbMATHGoogle Scholar
  13. 13.
    Dori, D., Wenyin, L.: Automated CAD conversion with the machine drawing understanding system: concepts, algorithms, and performance. IEEE Trans. Syst. Man Cybern. A 29(4), 411–416 (1999)CrossRefGoogle Scholar
  14. 14.
    D’Ulizia, A., Ferri, F., Grifoni, P.: A survey of grammatical inference methods for natural language learning. Artif. Intell. Rev. 36(1), 1–27 (2011)CrossRefGoogle Scholar
  15. 15.
    Embley, D.W., Hurst, M., Lopresti, D.P., Nagy, G.: Table-processing paradigms: a research survey. IJDAR 8(2–3), 66–86 (2006)CrossRefGoogle Scholar
  16. 16.
    Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Conference Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)Google Scholar
  17. 17.
    Fletcher, L., Kasturi, R.: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 910–918 (1988)CrossRefGoogle Scholar
  18. 18.
    Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.): Information Access Evaluation. Multilinguality, Multimodality, and Visualization—4th International Conference of the CLEF Initiative, Lecture Notes in Computer Science, vol. 8138. Springer, Valencia (2013)Google Scholar
  19. 19.
    Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey—part I. IEEE Trans. Syst. Man Cybern. 5(1), 95–111 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Fu, K.S., Booth, T.L.: Grammatical inference: introduction and survey—part II. IEEE Trans. Syst. Man Cybern. 5(4), 409–423 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Fullerton, R.L., McAfee, R.P.: Auctioning entry into tournaments. J. Polit. Econ. 107(3), 573–605 (1999)CrossRefGoogle Scholar
  22. 22.
    Gobeill, J., Teodoro, D., Pasche, E., Ruch, P.: Report on the TREC 2009 experiments: chemical IR track. In: Text Retrieval Conference (TREC’09) (2009)Google Scholar
  23. 23.
    Howe, J.: Crowdsourcing: Why the Power of the Crowd is Driving the Future of Business. Crown, New York (2008)Google Scholar
  24. 24.
    Jung, K., Kim, K.I., Jain, A.K.: Text information extraction in images and video: a survey. Pattern Recognit. 37(5), 977–997 (2004)CrossRefGoogle Scholar
  25. 25.
    Kalil, T., Sturm, R.: Congress grants broad prize authority to all federal agencies (2010). http://wh.gov/OSw
  26. 26.
    Kanungo, T., Haralick, R., Dori, D.: Understanding engineering drawings: a survey. In: Proceedings of Work. Graphics Recognition, pp. 217–228 (1995)Google Scholar
  27. 27.
    Karatzas, D., Mestre, S.R., Mas, J., Nourbakhsh, F., Roy, P.P.: ICDAR 2011 robust reading competition-challenge 1: reading text in born-digital images (web and email). In: Proceedings of the International Conference Document Analysis and Recognition, pp. 1485–1490 (2011)Google Scholar
  28. 28.
    Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., Gomez i Bigorda, L., Robles Mestre, S., Mas, J., Fernandez Mota, D., Almazan Almazan, J., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 1484–1493 (2013)Google Scholar
  29. 29.
    Koo, H., Kim, D., et al.: Scene text detection via connected component clustering and non-text filtering. IEEE Transaction Image Processing, pp. 2296–2305 (2013)Google Scholar
  30. 30.
    Lai, C., Kasturi, R.: Detection of dimension sets in engineering drawings. IEEE Trans. Pattern Anal. Mach. Intell. 16(8), 848–855 (1994)CrossRefGoogle Scholar
  31. 31.
    Lakhani, K.R., Boudreau, K.J., Loh, P.R., Backstrom, L., Baldwin, C., Lonstein, E., Lydon, M., MacCormack, A., Arnaout, Ra, Guinan, E.C.: Prize-based contests can provide solutions to computational biology problems. Nat. Biotechnol. 31(2), 108–111 (2013)CrossRefGoogle Scholar
  32. 32.
    Lamiroy, B., Lopresti, D.: An open architecture for end-to-end document analysis benchmarking. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 42–47. Beijing, China (2011)Google Scholar
  33. 33.
    Liang, J., Doermann, D.S., Li, H.: Camera-based analysis of text and documents: a survey. Int. J. Doc. Anal. Recognit. 7(2–3), 84–104 (2005)CrossRefGoogle Scholar
  34. 34.
    Lu, T., Tai, C.L., Yang, H., Cai, S.: A novel knowledge-based system for interpreting complex engineering drawings: theory, representation, and implementation. IEEE Trans. Pattern Anal. Mach. Intell. 31(8), 1444–1457 (2009)CrossRefGoogle Scholar
  35. 35.
    Lu, Z.: Detection of text regions from digital engineering drawings. IEEE Trans. Pattern Anal. Mach. Intell. 20(4), 431–439 (1998)CrossRefGoogle Scholar
  36. 36.
    Lupu, M., Hanbury, A.: Patent retrieval. Found. Trends Inf. Retr. 7(1), 1–97 (2013)CrossRefGoogle Scholar
  37. 37.
    Mervis, J.: Agencies rally to tackle big data. Science 336(6077), 22 (2012)CrossRefGoogle Scholar
  38. 38.
    Nagy, G.: Twenty years of document image analysis in PAMI. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 38–62 (2000)CrossRefGoogle Scholar
  39. 39.
    Nagy, G., Embley, D.W., Krishnamoorthy, M.S., Seth, S.C.: Clustering header categories extracted from web tables. In: Ringger, E.K., Lamiroy, B. (eds.) Document Recognition and Retrieval XXII, Proceedings of SPIE, vol. 9402, p. 94020M. San Francisco (2015)Google Scholar
  40. 40.
    Nelson, R.R.: Uncertainty, learning, and the economics of parallel research and development efforts. Rev. Econ. Stat. 43(4), 351–364 (1961)CrossRefGoogle Scholar
  41. 41.
    Niemeijer, M., Van Ginneken, B., Cree, M., Mizutani, A., Quellec, G., Sanchez, C., Zhang, B., Hornero, R., Lamard, M., Muramatsu, C.: Others: retinopathy online challenge: automatic detection of microaneurysms in digital color fundus photographs. IEEE Trans. Med. Imaging 29(1), 185–195 (2010)CrossRefGoogle Scholar
  42. 42.
    Ogier, J.M. (ed.): Proceedings of the International Work. Graphics Recognition (GREC 2013), Lecture Notes in Computer Science, vol. 8746. Springer, Bethlehem, PA (2014)Google Scholar
  43. 43.
    O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)CrossRefGoogle Scholar
  44. 44.
    Rice, S.V., Jenkins, F.R., Nartker, T.A.: The fifth annual test of OCR accuracy. Information Science Research Institute (1996)Google Scholar
  45. 45.
    Rice, S.V., Nagy, G.L., Nartker, T.A.: Optical Character Recognition: An Illustrated Guide to the Frontier. Kluwer, New York (1999)CrossRefGoogle Scholar
  46. 46.
    Roy, P.P., Pal, U., Lladós, J.: Text line extraction in graphical documents using background and foreground information. Int. J. Doc. Anal. Recognit. 15(3), 227–241 (2012)CrossRefGoogle Scholar
  47. 47.
    Rusiñol, M., de las Heras, L., Ramos, O.: Flowchart recognition for non-textual information retrieval in patent search. Inf. Retr. 17(5–6), 545–562 (2014)CrossRefGoogle Scholar
  48. 48.
    Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T.: LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77(1–3), 157–173 (2008)CrossRefGoogle Scholar
  49. 49.
    Sadawi, N.M., Sexton, A.P., Sorge, V.: Performance of MolRec at TREC 2011—overview and analysis of results. In: The Twentieth Text REtrieval Conference Proceedings (TREC). National Institute of Standards and Technology (NIST), USA (2011)Google Scholar
  50. 50.
    Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: Proceedings of International Conference Document Analysis and Recognition, pp. 1491–1496 (2011)Google Scholar
  51. 51.
    Simon, H., Newell, A.: Computer simulation of human thinking and problem solving. Monogr. Soc. Res. Child Behav. 27, 137–150 (1962)CrossRefGoogle Scholar
  52. 52.
    Smith, R.: An overview of the Tesseract OCR engine. In: Proceedings of the International Conference Document Analysis and Recognition, vol. 2, pp. 629–633. Curitiba, Brazil (2007)Google Scholar
  53. 53.
    Sturm, R.: New center for excellence fuels prize to help modernize tools for patent examination (2011). http://wh.gov/DdM
  54. 54.
    Tassey, G., Rowe, B.R., Wood, D.W., Link, A.N., Simoni, D.A.: Economic impact assessment of NIST’s text REtrieval conference (TREC) program. National Institute of Standards and Technology (2010)Google Scholar
  55. 55.
    Terwiesch, C., Ulrich, K.T.: Innovation Tournaments: Creating and Selecting Exceptional Opportunities. Harvard Business Press, Boston (2009)Google Scholar
  56. 56.
    Terwiesch, C., Xu, Y.: Innovation contests, open innovation, and multiagent problem solving. Manag. Sci. 54(9), 1529–1543 (2008)CrossRefGoogle Scholar
  57. 57.
    Tombre, K., Tabbone, S., Pélissier, L., Lamiroy, B., Dosch, P.: Text/graphics separation revisited. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) Document Analysis Systems, Lecture Notes in Computer Science, vol. 2423, pp. 200–211. Springer, Berlin (2002)Google Scholar
  58. 58.
    Valveny, E., Lamiroy, B.: Scan-to-XML: automatic generation of browsable technical documents. In: Proceedings of the International Conference Pattern Recognition, vol. 3, pp. 188–191. Québec City, Canada (2002)Google Scholar
  59. 59.
    Wagner, R., Fischer, M.: The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)MathSciNetCrossRefzbMATHGoogle Scholar
  60. 60.
    Wendling, L., Tabbone, S.: A new way to detect arrows in line drawings. IEEE Trans. Pattern Anal. Mach. Intell. 26(7), 935–941 (2004)CrossRefGoogle Scholar
  61. 61.
    Wu, V., Manmatha, R., Riseman, E.: Textfinder: an automatic system to detect and recognize text in images. IEEE Trans. Pattern Anal. Mach. Intell. 21(11), 1224–1229 (1999)CrossRefGoogle Scholar
  62. 62.
    Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)CrossRefGoogle Scholar
  63. 63.
    Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition. Int. J. Doc. Anal. Recognit. 7(1), 1–16 (2004)CrossRefGoogle Scholar
  64. 64.
    Zanibbi, R., Blostein, D., Cordy, J.R.: White-box evaluation of computer vision algorithms through explicit decision-making. Computer Vision Systems. Lecture Notes in Computer Science, vol. 5815, pp. 295–304. Springer, Liège, Belgium (2009)Google Scholar
  65. 65.
    Zheng, Y., Li, H., Doermann, D.: Machine printed text and handwriting identification in noisy document images. IEEE Trans. Pattern Anal. Mach. Intell. 26(3), 337–353 (2004)CrossRefGoogle Scholar
  66. 66.
    Zhou, W., Li, H., Lu, Y., Tian, Q.: Principal visual word discovery for automatic license plate detection. IEEE Trans. Image Process. 21(9), 4269–4279 (2012)MathSciNetCrossRefGoogle Scholar
  67. 67.
    Zhu, S., Zanibbi, R.: Label detection and recognition for USPTO images using convolutional k-means feature quantization and AdaBoost. In: Proceedings of the International Conference Document Analysis and Recognition, pp. 633–637. Washington, DC (2013)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Christoph Riedl
    • 1
    • 5
  • Richard Zanibbi
    • 2
  • Marti A. Hearst
    • 3
  • Siyu Zhu
    • 4
  • Michael Menietti
    • 5
  • Jason Crusan
    • 6
  • Ivan Metelsky
    • 7
  • Karim R. Lakhani
    • 8
  1. 1.D’Amore-McKim School of Business, and College of Computer and Information ScienceNortheastern UniversityBostonUSA
  2. 2.Department of Computer ScienceRochester Institute of TechnologyRochesterUSA
  3. 3.School of InformationUC BerkeleyBerkeleyUSA
  4. 4.Center for Imaging ScienceRochester Institute of TechnologyRochesterUSA
  5. 5.Institute for Quantitative Social ScienceHarvard UniversityCambridgeUSA
  6. 6.Advanced Exploration Systems DivisionNASAWashingtonUSA
  7. 7.TopCoder Inc.GlastonburyUSA
  8. 8.Department of Technology and Operations ManagementHarvard Business SchoolBostonUSA

Personalised recommendations