Skip to main content

Font Shape-to-Impression Translation

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13237)


Different fonts have different impressions, such as elegant, scary, and cool. This paper tackles part-based shape-impression analysis based on the Transformer architecture, which is able to handle the correlation among local parts by its self-attention mechanism. This ability will reveal how combinations of local parts realize a specific impression of a font. The versatility of Transformer allows us to realize two very different approaches for the analysis, i.e., multi-label classification and translation. A quantitative evaluation shows that our Transformer-based approaches estimate the font impressions from a set of local parts more accurately than other approaches. A qualitative evaluation then indicates the important local parts for a specific impression.


  • Font shape
  • Impression analysis
  • Translator

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

    \(\langle \mathtt{PAD}\rangle \) token is used when we train the decoder. \(\langle \mathtt{PAD}\rangle \) tokens are added to the end of the ground-truth (i.e., the sequence of the labeled impressions) multiple times until the length of the ground-truth reaches the maximum output length.

  2. 2.

    We have tried the SURF descriptors instead of the SIFT descriptors to show the justification to select SIFT as local shape descriptors. We found no significant differences between them. More precisely, the multi-label classifier using SURF achieved about 0.16-point higher mAP and 0.05-point lower F1@all than SIFT.


  1. Brown, T., et al.: Language models are few-shot learners. In: NeurIPS (2020)

    Google Scholar 

  2. Chen, T., Wang, Z., Xu, N., Jin, H., Luo, J.: Large-scale tag-based font retrieval with generative feature learning. In: ICCV (2019)

    Google Scholar 

  3. Choi, S., Aizawa, K., Sebe, N.: FontMatcher: font image paring for harmonious digital graphic design. In: ACM IUI (2018)

    Google Scholar 

  4. Cornia, M., Stefanini, M., Baraldi, L., Cucchiara, R.: Meshed-memory transformer for image captioning. In: CVPR (2020)

    Google Scholar 

  5. Davis, R.C., Smith, H.J.: Determinants of feeling tone in type faces. J. Appl. Psychol. 17(6), 742–764 (1933)

    CrossRef  Google Scholar 

  6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  7. Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: ICLR (2020)

    Google Scholar 

  8. He, S., Tu, Z., Wang, X., Wang, L., Lyu, M., Shi, S.: Towards understanding neural machine translation with word importance. In: EMNLP-IJCNLP (2019)

    Google Scholar 

  9. Hofstadter, D.R.: Metamagical Themas: Questing for the Essence of Mind and Pattern. Basic Books, Inc. (1985)

    Google Scholar 

  10. Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S., Shah, M.: Transformers in vision: a survey. arXiv preprint arXiv:2101.01169 (2021)

  11. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    CrossRef  Google Scholar 

  13. Matsuda, S., Kimura, A., Uchida, S.: Impressions2Font: generating fonts by specifying impressions. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 739–754. Springer, Cham (2021).

    CrossRef  Google Scholar 

  14. O’Donovan, P., Lībeks, J., Agarwala, A., Hertzmann, A.: Exploratory font selection using crowdsourced attributes. ACM TOG 33(4), 92 (2014)

    Google Scholar 

  15. Otter, D.W., Medina, J.R., Kalita, J.K.: A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(2), 604–624 (2021)

    CrossRef  MathSciNet  Google Scholar 

  16. Poffenberger, A.T., Franken, R.: A study of the appropriateness of type faces. J. Appl. Psychol. 7(4), 312–329 (1923)

    CrossRef  Google Scholar 

  17. Shaikh, D., Chaparro, B.: Perception of fonts: perceived personality traits and appropriate uses. In: Digital Fonts and Reading, chap. 13. World Scientific (2016)

    Google Scholar 

  18. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML (2017)

    Google Scholar 

  19. Ueda, M., Kimura, A., Uchida, S.: Which parts determine the impression of the font? In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 723–738. Springer, Cham (2021).

    CrossRef  Google Scholar 

  20. Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)

    Google Scholar 

  21. Zaheer, M., Kottur, S., Ravanbhakhsh, S., Póczos, B., Salakhutdinov, R., Smola, A.J.: Deep sets. In: NeurIPS (2017)

    Google Scholar 

  22. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014).

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Masaya Ueda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ueda, M., Kimura, A., Uchida, S. (2022). Font Shape-to-Impression Translation. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06554-5

  • Online ISBN: 978-3-031-06555-2

  • eBook Packages: Computer ScienceComputer Science (R0)