Attribute-based document image retrieval

Cote, Melissa; Branzan Albu, Alexandra

doi:10.1007/s10032-023-00447-6

Melissa Cote¹ &
Alexandra Branzan Albu¹

1 Citation
Explore all metrics

Abstract

This paper explores the use of attributes for document image querying and retrieval. Existing document image retrieval techniques present several drawbacks: textual searches are limited to text, query-by-example searches require a sample query document on hand, and layout-based searches rigidly assign documents to one of several preset classes. Attributes have yet to be fully exploited in document image analysis. We describe document images based on attributes and utilize those descriptions to form a new querying paradigm for document image retrieval that addresses the above limitations: attribute-based document image retrieval (ABDIR). We create attribute-based descriptions of the documents using an expandable set of individual, independent attribute classifiers built on convolutional neural network architectures. We combine the descriptions to form queries of variable complexity which retrieve a ranked list of document images. ABDIR allows users to search for documents based on memorable visual features of their contents in a flexible way, with queries like “Find documents that have a one-column layout, are table dominant, and are colorful”, or “Find historical documents that are illuminated and have see-through artifacts”. Experiments on the recent PubLayNet and HisIR19 datasets demonstrate the system’s ability to extract various document image attributes with high accuracy, with Darknet-53 performing best, and show very promising results for document image retrieval. ABDIR is scalable and versatile: it is easy to change, add, and remove attributes, and easy to adapt queries to new domains. It provides for document image retrieval capabilities that are not possible or are impractical with other paradigms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Attribute Discovery with Neural Activations

Interactive Visual and Semantic Image Retrieval

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Data availability

The authors used the publicly available PubLayNet and HisIR19 datasets.

Notes

Here, we consider mixed content as a mixture of text, tables, and figures.
Can be applied or pendent.
Illuminated manuscripts: Handwritten books with painted flourishes, such as borders and miniature illustrations, that typically include precious metals (gold or silver).
See-through: One of the most common degradations affecting historical documents that are written or printed on both sides of the page, an undesired pattern in the background caused by the text/ink in the reverse side of the page [44].

References

Feris, R.S., et al.: Introduction to visual attributes. In: Feris, R.S., Lampert, C., Parikh, D. (eds.) Visual Attributes. Advances in Computer Vision and Pattern Recognition, pp. 1–7. Springer, Cham (2017)
Google Scholar
Hwang, S.J., et al.: Sharing features between objects and their attributes. In: CVPR, IEEE, pp 1761–8 (2011)
Zhang, F., et al.: Grouped attribute strength-based image retrieval. J . Electron. Imaging 28(1), 013048 (2019)
Article ADS Google Scholar
Lampert, C.H., et al.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Patt. Anal. Mach. Intell. 36(3), 453–65 (2013)
Article Google Scholar
Liu, J., et al.: Recognizing human actions by attributes. In: CVPR, IEEE, pp 3337–44 (2011)
Yan, X., et al.: Attribute2Image: Conditional image generation from visual attributes. In: ECCV, pp. 776–91. Springer, Cham (2016)
Almazán, J., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Patt. Anal. Mach. Intell. 36(12), 2552–66 (2014)
Article Google Scholar
Ferrari, V., Zisserman, A.: Learning visual attributes. Adv. Neural Inf. Process Syst. 433–40 (2007)
Engelkamp, J., Zimmer, H.D.: Human Memory: A Multimodal Approach. Hogrefe & Huber Publishers, Seattle (1994)
Google Scholar
Blanc-Brude, T., Scapin, D.L.: What do people recall about their documents? Implications for desktop search tools. In: IUI, ACM, pp 102–11 (2007)
Borkin, M.A., et al.: What makes a visualization memorable? IEEE Trans. Vis. Comput. Gr. 19(12), 2306–15 (2013)
Article Google Scholar
Giotis, A.P., et al.: A survey of document image word spotting techniques. Patt. Recognit. 68, 310–32 (2017)
Article ADS Google Scholar
Duan, L.Y., et al.: Towards mobile document image retrieval for digital library. IEEE Trans. Multimed. 16(2), 346–59 (2013)
Article Google Scholar
Roy, S.D., et al.: Camera-based document image matching using multi-feature probabilistic information fusion. Patt. Recognit. Lett. 58, 42–50 (2015)
Article ADS Google Scholar
Sharma, N., et al.: Signature and logo detection using deep CNN for document image retrieval. In: ICFHR, IEEE, pp 416–22 (2018)
Zhu, G., Doermann, D.: Logo matching for document image retrieval. In: ICDAR’09, IEEE, pp 606–10 (2009)
Ubeda, I., et al.: Improving pattern spotting in historical documents using feature pyramid networks. Patt. Recognit. Lett. 131, 398–404 (2020)
Article ADS Google Scholar
Marinai, S., et al.: Layout based document image retrieval by means of XY tree reduction. In: ICDAR, IEEE, pp 432–6 (2005)
Kumar, J., et al.: Structural similarity for document image classification and retrieval. Patt. Recognit. Lett. 43, 119–26 (2014)
Article ADS Google Scholar
Marinai, S., et al.: Digital libraries and document image retrieval techniques: A survey. In: Biba, M., Xhafa, F. (eds.) Learning Structure and Schemas from Documents, Studies in Computational Intelligence, vol. 375, pp. 181–204. Springer, Berlin (2011)
Google Scholar
Siddiquie, B., et al.: Image ranking and retrieval based on multi-attribute queries. In: CVPR, IEEE, pp 801–8 (2011)
Liu, Z., et al.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: CVPR, IEEE, pp 1096–104 (2016)
Zhao, B., et al.: Memory-augmented attribute manipulation networks for interactive fashion search. In: CVPR, IEEE, pp 1520–8 (2017)
Kumar, N., et al.: Describable visual attributes for face verification and image search. IEEE Trans. Patt. Anal. Mach. Intell. 33(10), 1962–77 (2011)
Article Google Scholar
An, L., et al.: Scalable attribute-driven face image retrieval. Neurocomput. 172, 215–24 (2016)
Article Google Scholar
Fang, Y., Yuan, Q.: Attribute-enhanced metric learning for face retrieval. EURASIP J. Image Video Process. 2018, 44 (2018)
Article Google Scholar
Sandeep, R.N., et al.: Relative parts: Distinctive parts for learning relative attributes. In: CVPR, IEEE, pp 3614–21 (2014)
Kovashka, A., et al.: Whittlesearch: Interactive image search with relative attribute feedback. Int. J. Comput. Vis. 115(2), 185–210 (2015)
Article MathSciNet Google Scholar
Yu, Z., Kovashka, A.: Syntharch: Interactive image search with attribute-conditioned synthesis. In: CVPRW, IEEE/CVF, pp 170–1 (2020)
Albu, A.B., Nagy, G.: Imaging reality and abstraction an exploration of natural and symbolic patterns. In: VISIGRAPP (VISAPP), SCITEPRESS, pp 415–22 (2021)
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput. 29(9), 2352–449 (2017)
Article MathSciNet PubMed Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: CVPR, IEEE, pp 770–8 (2016)
Huang, G., et al.: Densely connected convolutional networks. In: CVPR, IEEE, pp 4700–8 (2017)
Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: CVPR, IEEE, pp 2818–26 (2016)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: CVPR, IEEE, pp 1251–8 (2017)
Szegedy, C., et al.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: AAAI-17, pp 4278–84 (2017)
Redmon, C., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Zoph, B., et al.: Learning transferable architectures for scalable image recognition. In: CVPR, IEEE, pp 8697–710 (2018)
Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: ICML, PMLR, pp 6105–14 (2019)
Zhang, C., et al.: ResNet or DenseNet? Introducing dense shortcuts to ResNet. In: WACV, IEEE/CVF, pp 3550–9 (2021)
Jiao, L., Zhao, J.: A survey on the new generation of deep learning in image processing. IEEE Access 7, 172231–63 (2019)
Article Google Scholar
Zhong, X., et al.: PubLayNet: Largest dataset ever for document layout analysis. In: ICDAR, IEEE, pp 1015–22 (2019)
Christlein, V., et al.: ICDAR 2019 competition on image retrieval for historical handwritten documents. In: ICDAR, IEEE, pp 1505–9 (2019)
Tonazzini, A., Bedini, L.: Restoration of recto-verso colour documents using correlated component analysis. EURASIP J. Adv. Sign. Process. 2013, 58 (2013)
Article ADS Google Scholar
Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In: CVPR’09, IEEE, pp 248–55 (2009)
Manning, C.D., et al.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)
Google Scholar
US National Archives (2022) Project BLUE BOOK: Unidentified Flying Objects. https://www.archives.gov/research/military/air-force/ufos. Accessed 18 Jan 2022

Download references

Acknowledgements

The authors would like to thank Mike Mabey at QuirkLogic Inc. for his valuable input on use cases and usability.

Funding

This research was supported by the Natural Sciences and Engineering Research Council of Canada and QuirkLogic Inc. through the CRD Grants program (No. CRDPJ 525586-18).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Victoria, P.O. Box 3055, STN CSC, Victoria, BC, V8W 3P6, Canada
Melissa Cote & Alexandra Branzan Albu

Authors

Melissa Cote
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Branzan Albu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by M. Cote. The first draft of the manuscript was written by M. Cote, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Melissa Cote.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cote, M., Branzan Albu, A. Attribute-based document image retrieval. IJDAR 27, 57–71 (2024). https://doi.org/10.1007/s10032-023-00447-6

Download citation

Received: 03 June 2022
Revised: 09 December 2022
Accepted: 01 July 2023
Published: 17 July 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10032-023-00447-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attribute-based document image retrieval

Abstract

Access this article

Similar content being viewed by others

Automatic Attribute Discovery with Neural Activations

Interactive Visual and Semantic Image Retrieval

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attribute-based document image retrieval

Abstract

Access this article

Similar content being viewed by others

Automatic Attribute Discovery with Neural Activations

Interactive Visual and Semantic Image Retrieval

Abstract Images Have Different Levels of Retrievability Per Reverse Image Search Engine

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation