Abstract
This paper explores the use of attributes for document image querying and retrieval. Existing document image retrieval techniques present several drawbacks: textual searches are limited to text, query-by-example searches require a sample query document on hand, and layout-based searches rigidly assign documents to one of several preset classes. Attributes have yet to be fully exploited in document image analysis. We describe document images based on attributes and utilize those descriptions to form a new querying paradigm for document image retrieval that addresses the above limitations: attribute-based document image retrieval (ABDIR). We create attribute-based descriptions of the documents using an expandable set of individual, independent attribute classifiers built on convolutional neural network architectures. We combine the descriptions to form queries of variable complexity which retrieve a ranked list of document images. ABDIR allows users to search for documents based on memorable visual features of their contents in a flexible way, with queries like “Find documents that have a one-column layout, are table dominant, and are colorful”, or “Find historical documents that are illuminated and have see-through artifacts”. Experiments on the recent PubLayNet and HisIR19 datasets demonstrate the system’s ability to extract various document image attributes with high accuracy, with Darknet-53 performing best, and show very promising results for document image retrieval. ABDIR is scalable and versatile: it is easy to change, add, and remove attributes, and easy to adapt queries to new domains. It provides for document image retrieval capabilities that are not possible or are impractical with other paradigms.
Similar content being viewed by others
Data availability
The authors used the publicly available PubLayNet and HisIR19 datasets.
Notes
Here, we consider mixed content as a mixture of text, tables, and figures.
Can be applied or pendent.
Illuminated manuscripts: Handwritten books with painted flourishes, such as borders and miniature illustrations, that typically include precious metals (gold or silver).
See-through: One of the most common degradations affecting historical documents that are written or printed on both sides of the page, an undesired pattern in the background caused by the text/ink in the reverse side of the page [44].
References
Feris, R.S., et al.: Introduction to visual attributes. In: Feris, R.S., Lampert, C., Parikh, D. (eds.) Visual Attributes. Advances in Computer Vision and Pattern Recognition, pp. 1–7. Springer, Cham (2017)
Hwang, S.J., et al.: Sharing features between objects and their attributes. In: CVPR, IEEE, pp 1761–8 (2011)
Zhang, F., et al.: Grouped attribute strength-based image retrieval. J . Electron. Imaging 28(1), 013048 (2019)
Lampert, C.H., et al.: Attribute-based classification for zero-shot visual object categorization. IEEE Trans. Patt. Anal. Mach. Intell. 36(3), 453–65 (2013)
Liu, J., et al.: Recognizing human actions by attributes. In: CVPR, IEEE, pp 3337–44 (2011)
Yan, X., et al.: Attribute2Image: Conditional image generation from visual attributes. In: ECCV, pp. 776–91. Springer, Cham (2016)
Almazán, J., et al.: Word spotting and recognition with embedded attributes. IEEE Trans. Patt. Anal. Mach. Intell. 36(12), 2552–66 (2014)
Ferrari, V., Zisserman, A.: Learning visual attributes. Adv. Neural Inf. Process Syst. 433–40 (2007)
Engelkamp, J., Zimmer, H.D.: Human Memory: A Multimodal Approach. Hogrefe & Huber Publishers, Seattle (1994)
Blanc-Brude, T., Scapin, D.L.: What do people recall about their documents? Implications for desktop search tools. In: IUI, ACM, pp 102–11 (2007)
Borkin, M.A., et al.: What makes a visualization memorable? IEEE Trans. Vis. Comput. Gr. 19(12), 2306–15 (2013)
Giotis, A.P., et al.: A survey of document image word spotting techniques. Patt. Recognit. 68, 310–32 (2017)
Duan, L.Y., et al.: Towards mobile document image retrieval for digital library. IEEE Trans. Multimed. 16(2), 346–59 (2013)
Roy, S.D., et al.: Camera-based document image matching using multi-feature probabilistic information fusion. Patt. Recognit. Lett. 58, 42–50 (2015)
Sharma, N., et al.: Signature and logo detection using deep CNN for document image retrieval. In: ICFHR, IEEE, pp 416–22 (2018)
Zhu, G., Doermann, D.: Logo matching for document image retrieval. In: ICDAR’09, IEEE, pp 606–10 (2009)
Ubeda, I., et al.: Improving pattern spotting in historical documents using feature pyramid networks. Patt. Recognit. Lett. 131, 398–404 (2020)
Marinai, S., et al.: Layout based document image retrieval by means of XY tree reduction. In: ICDAR, IEEE, pp 432–6 (2005)
Kumar, J., et al.: Structural similarity for document image classification and retrieval. Patt. Recognit. Lett. 43, 119–26 (2014)
Marinai, S., et al.: Digital libraries and document image retrieval techniques: A survey. In: Biba, M., Xhafa, F. (eds.) Learning Structure and Schemas from Documents, Studies in Computational Intelligence, vol. 375, pp. 181–204. Springer, Berlin (2011)
Siddiquie, B., et al.: Image ranking and retrieval based on multi-attribute queries. In: CVPR, IEEE, pp 801–8 (2011)
Liu, Z., et al.: Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: CVPR, IEEE, pp 1096–104 (2016)
Zhao, B., et al.: Memory-augmented attribute manipulation networks for interactive fashion search. In: CVPR, IEEE, pp 1520–8 (2017)
Kumar, N., et al.: Describable visual attributes for face verification and image search. IEEE Trans. Patt. Anal. Mach. Intell. 33(10), 1962–77 (2011)
An, L., et al.: Scalable attribute-driven face image retrieval. Neurocomput. 172, 215–24 (2016)
Fang, Y., Yuan, Q.: Attribute-enhanced metric learning for face retrieval. EURASIP J. Image Video Process. 2018, 44 (2018)
Sandeep, R.N., et al.: Relative parts: Distinctive parts for learning relative attributes. In: CVPR, IEEE, pp 3614–21 (2014)
Kovashka, A., et al.: Whittlesearch: Interactive image search with relative attribute feedback. Int. J. Comput. Vis. 115(2), 185–210 (2015)
Yu, Z., Kovashka, A.: Syntharch: Interactive image search with attribute-conditioned synthesis. In: CVPRW, IEEE/CVF, pp 170–1 (2020)
Albu, A.B., Nagy, G.: Imaging reality and abstraction an exploration of natural and symbolic patterns. In: VISIGRAPP (VISAPP), SCITEPRESS, pp 415–22 (2021)
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: A comprehensive review. Neural Comput. 29(9), 2352–449 (2017)
He, K., et al.: Deep residual learning for image recognition. In: CVPR, IEEE, pp 770–8 (2016)
Huang, G., et al.: Densely connected convolutional networks. In: CVPR, IEEE, pp 4700–8 (2017)
Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: CVPR, IEEE, pp 2818–26 (2016)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: CVPR, IEEE, pp 1251–8 (2017)
Szegedy, C., et al.: Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: AAAI-17, pp 4278–84 (2017)
Redmon, C., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Zoph, B., et al.: Learning transferable architectures for scalable image recognition. In: CVPR, IEEE, pp 8697–710 (2018)
Tan, M., Le, Q.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: ICML, PMLR, pp 6105–14 (2019)
Zhang, C., et al.: ResNet or DenseNet? Introducing dense shortcuts to ResNet. In: WACV, IEEE/CVF, pp 3550–9 (2021)
Jiao, L., Zhao, J.: A survey on the new generation of deep learning in image processing. IEEE Access 7, 172231–63 (2019)
Zhong, X., et al.: PubLayNet: Largest dataset ever for document layout analysis. In: ICDAR, IEEE, pp 1015–22 (2019)
Christlein, V., et al.: ICDAR 2019 competition on image retrieval for historical handwritten documents. In: ICDAR, IEEE, pp 1505–9 (2019)
Tonazzini, A., Bedini, L.: Restoration of recto-verso colour documents using correlated component analysis. EURASIP J. Adv. Sign. Process. 2013, 58 (2013)
Deng, J., et al.: Imagenet: A large-scale hierarchical image database. In: CVPR’09, IEEE, pp 248–55 (2009)
Manning, C.D., et al.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)
US National Archives (2022) Project BLUE BOOK: Unidentified Flying Objects. https://www.archives.gov/research/military/air-force/ufos. Accessed 18 Jan 2022
Acknowledgements
The authors would like to thank Mike Mabey at QuirkLogic Inc. for his valuable input on use cases and usability.
Funding
This research was supported by the Natural Sciences and Engineering Research Council of Canada and QuirkLogic Inc. through the CRD Grants program (No. CRDPJ 525586-18).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by M. Cote. The first draft of the manuscript was written by M. Cote, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cote, M., Branzan Albu, A. Attribute-based document image retrieval. IJDAR 27, 57–71 (2024). https://doi.org/10.1007/s10032-023-00447-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-023-00447-6