Machine Vision and Applications

, Volume 28, Issue 5–6, pp 551–568 | Cite as

Learning typographic style: from discrimination to synthesis

  • Shumeet Baluja
Original Paper


Typography is a ubiquitous art form that affects our understanding, perception and trust in what we read. Thousands of different font-faces have been created with enormous variations in the characters. In this paper, we learn the style of a font by analyzing a small subset of only four letters. From these four letters, we learn two tasks. The first is a discrimination task: given the four letters and a new candidate letter, does the new letter belong to the same font? Second, given the four basis letters, can we generate all of the other letters with the same characteristics as those in the basis set? We use deep neural networks to address both tasks, quantitatively and qualitatively measure the results in a variety of novel manners, and present a thorough investigation of the weaknesses and strengths of the approach. All of the experiments are conducted with publicly available font sets.


Style analysis Typography Image generation Image synthesis Machine learning 



Please see Figure 11.

Compliance with ethical standards

Conflict of interest

The author declares that he has no conflict of interest.


  1. 1.
    10000Fontscom (2016) Download.
  2. 2.
    Aucouturier, J.J., Pachet, F.: Representing musical genre: a state of the art. J. New Music Res. 32(1), 83–93 (2003)CrossRefGoogle Scholar
  3. 3.
    Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. Unsupervised Transf. Learn. Chall. Mach Learn. 7, 19 (2012)Google Scholar
  4. 4.
    Bernhardsson, E.: Analyzing 50k fonts using deep neural networks. (2016)
  5. 5.
    Beymer, D., Russell, D., Orton, P.: An eye tracking study of how font size and type influence online reading. In: Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction-Volume 2, pp. 15–18. British Computer Society, (2008)Google Scholar
  6. 6.
    Bowey, M.: A 20 minute intro to typography basics. (2009)
  7. 7.
    Bowey, M.: A fontastic voyage: generative fonts with adversarial networks. (2016)
  8. 8.
    Campbell, N.D., Kautz, J.: Learning a manifold of fonts. ACM Trans. Graph. (TOG) 33(4), 91 (2014)CrossRefGoogle Scholar
  9. 9.
    Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Casey, R.G., Lecolinet, E.: A survey of methods and strategies in character segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 18(7), 690–706 (1996)CrossRefGoogle Scholar
  11. 11.
    Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE, (2012)Google Scholar
  12. 12.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)zbMATHGoogle Scholar
  13. 13.
    Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds.) Advances in Neural Information Processing Systems 28, pp. 1486–1494. Curran Associates, Inc. (2015)
  14. 14.
    Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Computer Vision–ECCV, pp. 184–199. Springer, (2014)Google Scholar
  15. 15.
    Dosovitskiy, A., Brox, T.: Generating Images with Perceptual Similarity Metrics based on Deep Networks. ArXiv e-prints arXiv:1602.02644 (2016)
  16. 16.
    Eck, D., Schmidhuber, J.: A first look at music composition using LSTM recurrent neural networks. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale 103 (2002)Google Scholar
  17. 17.
    Feng, J.C., Tse, C., Qiu, Y.: Wavelet-transform-based strategy for generating new chinese fonts. In: Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS’03, vol. 4, pp. IV–IV . IEEE, (2003)Google Scholar
  18. 18.
    Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: a deep visual-semantic embedding model. In: Advances in Neural Information Processing Systems, pp. 2121–2129. (2013)Google Scholar
  19. 19.
    Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. CoRR arXiv:1508.06576 (2015)
  20. 20.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680. (2014)Google Scholar
  21. 21.
    Graves, A.: Generating sequences with recurrent neural networks. CoRR arXiv:1308.0850 (2013)
  22. 22.
    Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)
  23. 23.
    Gregor, K., Danihelka, I., Graves, A., Wierstra, D.: DRAW: a recurrent neural network for image generation. CoRR arXiv:1502.04623 (2015)
  24. 24.
    Gygli, M., Grabner, H., Riemenschneider, H., Nater, F., Gool, L.: The interestingness of images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1633–1640 (2013)Google Scholar
  25. 25.
    Ha, S.: The influence of design factors on trust in a bank’s website. Digital Repository@ Iowa State University, (2009)
  26. 26.
    Hassan, T., Hu, C., Hersch, R.D.: Next generation typeface representations: revisiting parametric fonts. In: Proceedings of the 10th ACM symposium on Document engineering, pp. 181–184. ACM, (2010)Google Scholar
  27. 27.
    Hu, C., Hersch, R.D.: Parameterizable fonts based on shape components. IEEE Comput. Graph. Appl. 21(3), 70–85 (2001)Google Scholar
  28. 28.
    Huang, J.T., Li, J., Yu, D., Deng, L., Gong, Y.: Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7304–7308 . IEEE, (2013)Google Scholar
  29. 29.
    Hutchings, E.: Typeface timeline shows us the history of fonts. (2014)
  30. 30.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  31. 31.
    Im, J.D., Kim, C.D., Jiang, H., Memisevic, R.: Generating images with recurrent adversarial networks. arXiv e-prints arXiv:1602.05110 (2016)
  32. 32.
    Joachims, T.: Making large scale SVM learning practical. Universität Dortmund, Technical report (1999)Google Scholar
  33. 33.
    Karayev, S., Hertzmann, A., Winnemoeller, H., Agarwala, A., Darrell, T.: Recognizing image style. CoRR arXiv:1311.3715 (2013)
  34. 34.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1725–1732. (2014)Google Scholar
  35. 35.
    Khosla, A., Das Sarma, A., Hamid, R.: What makes an image popular? In: Proceedings of the 23rd International Conference on World Wide Web, pp. 867–876. ACM, (2014)Google Scholar
  36. 36.
    Kwok, K.W., Wong, S.M., Lo, K.W., Yam, Y.: Genetic algorithm-based brush stroke generation for replication of Chinese calligraphic character. In: IEEE Congress on Evolutionary Computation, 2006. CEC 2006, pp. 1057–1064 . IEEE, (2006)Google Scholar
  37. 37.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  38. 38.
    Li, Y.M., Yeh, Y.S.: Increasing trust in mobile commerce through design aesthetics. Comput. Hum. Behav. 26(4), 673–684 (2010)CrossRefGoogle Scholar
  39. 39.
    Lun, Z., Kalogerakis, E., Sheffer, A.: Elements of style: learning perceptual shape style similarity. ACM Trans. Graph. (TOG) 34(4), 84 (2015)CrossRefGoogle Scholar
  40. 40.
    Miyazaki, T., Tsuchiya, T., Sugaya, Y., Omachi, S., Iwamura, M., Uchida, S., Kise, K.: Automatic generation of typographic font from a small font subset. CoRR arXiv:1701.05703 (2017)
  41. 41.
    Mordvintsev, A., Olah, C., Tyka, M.: Inceptionism: going deeper into neural networks. (2015)
  42. 42.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724. (2014)Google Scholar
  43. 43.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: Proceedings of The 33rd International Conference on Machine Learning, vol. 3. (2016)Google Scholar
  44. 44.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.: Imagenet large scale visual recognition challenge. CoRR arXiv:1409.0575 (2014)
  45. 45.
    Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26(5), 1651–1686 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    Suveeranont, R., Igarashi, T.: Example-based automatic font generation. In: International Symposium on Smart Graphics, pp. 127–138. Springer (2010)Google Scholar
  47. 47.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  48. 48.
    Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)CrossRefGoogle Scholar
  49. 49.
    Tschichold, J.: Treasury of alphabets and lettering: a source book of the best letter forms of past and present for sign painters, graphic artists, commercial artists, typographers, printers, sculptors, architects, and schools of art and design. A Norton professional book, Norton. (1995)
  50. 50.
    Upchurch, P., Snavely, N., Bala, K.: From A to Z: supervised transfer of style and content using deep neural network generators. ArXiv e-prints arXiv:1603.02003 (2016)
  51. 51.
    Van Santen, J.P., Sproat, R., Olive, J., Hirschberg, J.: Progress in Speech Synthesis. Springer, Berlin (2013)zbMATHGoogle Scholar
  52. 52.
    Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1386–139. (2014)Google Scholar
  53. 53.
    Wang, M.: Multi-path convolutional neural networks for complex image classification. CoRR arXiv:1506.04701 (2015)
  54. 54.
    Wang, Y., Wang, H., Pan, C., Fang, L.: Style preserving Chinese character synthesis based on hierarchical representation of character. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, pp. 1097–1100. IEEE, (2008)Google Scholar
  55. 55.
    Wang, Z., Yang, J., Jin, H., Shechtman, E., Agarwala, A., Brandt, J., Huang, T.S.: Deepfont: identify your font from an image. In: Proceedings of the 23rd ACM international conference on Multimedia, pp. 451–459. ACM, (2015)Google Scholar
  56. 56.
    Willats, J., Durand, F.: Defining pictorial style: lessons from linguistics and computer graphics. Axiomathes 15(3), 319–351 (2005)CrossRefGoogle Scholar
  57. 57.
    Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798. (2014)Google Scholar
  58. 58.
    Zaremba, W., Sutskever, I.: Learning to execute. arXiv preprint arXiv:1410.4615 (2014)
  59. 59.
    Zeiler, M.D., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q.V., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 3517–3521. IEEE, (2013)Google Scholar
  60. 60.
    Zong, A., Zhu, Y.: Strokebank: automating personalized Chinese handwriting generation. In: AAAI, pp. 3024–3030 (2014)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Google, Inc.Mountain ViewUSA

Personalised recommendations