Advertisement

Quadtree Convolutional Neural Networks

  • Pradeep Kumar JayaramanEmail author
  • Jianhan Mei
  • Jianfei Cai
  • Jianmin Zheng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11210)

Abstract

This paper presents a Quadtree Convolutional Neural Network (QCNN) for efficiently learning from image datasets representing sparse data such as handwriting, pen strokes, freehand sketches, etc. Instead of storing the sparse sketches in regular dense tensors, our method decomposes and represents the image as a linear quadtree that is only refined in the non-empty portions of the image. The actual image data corresponding to non-zero pixels is stored in the finest nodes of the quadtree. Convolution and pooling operations are restricted to the sparse pixels, leading to better efficiency in computation time as well as memory usage. Specifically, the computational and memory costs in QCNN grow linearly in the number of non-zero pixels, as opposed to traditional CNNs where the costs are quadratic in the number of pixels. This enables QCNN to learn from sparse images much faster and process high resolution images without the memory constraints faced by traditional CNNs. We study QCNN on four sparse image datasets for sketch classification and simplification tasks. The results show that QCNN can obtain comparable accuracy with large reduction in computational and memory costs.

Keywords

Quadtree Neural network Sparse convolution 

Notes

Acknowledgements

We thank the anonymous reviewers for their constructive comments. This research is supported by the National Research Foundation under Virtual Singapore Award No. NRF2015VSG-AA3DCM001-018, and the BeingTogether Centre, a collaboration between Nanyang Technological University (NTU) Singapore and University of North Carolina (UNC) at Chapel Hill. The BeingTogether Centre is supported by the National Research Foundation, Prime Ministers Office, Singapore under its International Research Centres in Singapore Funding Initiative. This research is also supported in part by Singapore MoE Tier-2 Grant (MOE2016-T2-2-065).

References

  1. 1.
    Simo-Serra, E., Iizuka, S., Sasaki, K., Ishikawa, H.: Learning to simplify: fully convolutional networks for rough sketch cleanup. ACM Trans. Graph. 35(4), 121:1–121:11 (2016)CrossRefGoogle Scholar
  2. 2.
    Wu, Z., et al.: 3D shapeNets: a deep representation for volumetric shapes. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1912–1920 (2015)Google Scholar
  3. 3.
    Riegler, G., Ulusoy, A.O., Geiger, A.: OctNet: learning deep 3D representations at high resolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6620–6629 (2017)Google Scholar
  4. 4.
    Wang, P.S., Liu, Y., Guo, Y.X., Sun, C.Y., Tong, X.: O-CNN: octree-based convolutional neural networks for 3D shape analysis. ACM Trans. Graph. 36(4), 72:1–72:11 (2017)Google Scholar
  5. 5.
    Tatarchenko, M., Dosovitskiy, A., Brox, T.: Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs. CoRR abs/1703.09438 (2017)Google Scholar
  6. 6.
    Hunter, G.M., Steiglitz, K.: Operations on images using quad trees. IEEE Trans. Pattern Anal. Mach. Intell. 1(2), 145–153 (1979)CrossRefGoogle Scholar
  7. 7.
    Jackins, C.L., Tanimoto, S.L.: Oct-trees and their use in representing three-dimensional objects. Comput. Graph. Image Process. 14(3), 249–270 (1980)CrossRefGoogle Scholar
  8. 8.
    Gervautz, M., Purgathofer, W.: A simple method for color quantization: octree quantization. In: Magnenat-Thalmann, N., Thalmann, D. (eds.) New Trends in Computer Graphics, pp. 219–231. Springer, Heidelberg (1988).  https://doi.org/10.1007/978-3-642-83492-9_20CrossRefGoogle Scholar
  9. 9.
    Sullivan, G.J., Baker, R.L.: Efficient quadtree coding of images and video. IEEE Trans. Image Process. 3(3), 327–331 (1994)CrossRefGoogle Scholar
  10. 10.
    Agarwala, A.: Efficient gradient-domain compositing using quadtrees. ACM Trans. Graph. 26(3), 94 (2007)CrossRefGoogle Scholar
  11. 11.
    Gargantini, I.: An effective way to represent quadtrees. Commun. ACM 25(12), 905–910 (1982)CrossRefGoogle Scholar
  12. 12.
    Graham, B.: Spatially-sparse convolutional neural networks. CoRR abs/1409.6070 (2014)Google Scholar
  13. 13.
    Graham, B.: Sparse 3D convolutional neural networks. In: Xianghua Xie, M.W.J., Tam, G.K.L. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 150.1–150.9. BMVA Press (2015)Google Scholar
  14. 14.
    Graham, B., van der Maaten, L.: Submanifold sparse convolutional networks. CoRR abs/1706.01307 (2017)Google Scholar
  15. 15.
    Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: Lorette, G. (ed.) Proceedings - Tenth International Workshop on Frontiers in Handwriting Recognition. Université de Rennes 1, Suvisoft (2006)Google Scholar
  16. 16.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: the all convolutional net. CoRR abs/1412.6806 (2014)Google Scholar
  17. 17.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014. LNCS, pp. 818–833. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_53CrossRefGoogle Scholar
  18. 18.
    Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015. LNCS, pp. 234–241. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24574-4_28CrossRefGoogle Scholar
  19. 19.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
  20. 20.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning ICML2010, USA, Omnipress, pp. 807–814 (2010)Google Scholar
  21. 21.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  22. 22.
    Cohen, G., Afshar, S., Tapson, J., van Schaik, A.: EMNIST: an extension of MNIST to handwritten letters. CoRR abs/1702.05373 (2017)Google Scholar
  23. 23.
    Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: Online and offline handwritten chinese character recognition: benchmarking on new databases. Pattern Recogn. 46(1), 155–162 (2013)CrossRefGoogle Scholar
  24. 24.
    Eitz, M., Hays, J., Alexa, M.: How do humans sketch objects? ACM Trans. Graph. 31(4), 44:1–44:10 (2012)Google Scholar
  25. 25.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR abs/1212.5701 (2012)Google Scholar
  26. 26.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, PMLR, vol. 9, pp. 249–256 (2010)Google Scholar
  27. 27.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)Google Scholar
  28. 28.
    Ha, D., Eck, D.: A neural representation of sketch drawings. CoRR abs/1704.03477 (2017)Google Scholar
  29. 29.
    Simo-Serra, E., Iizuka, S., Ishikawa, H.: Mastering sketching: adversarial augmentation for structured prediction. CoRR abs/1703.08966 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Pradeep Kumar Jayaraman
    • 1
    Email author
  • Jianhan Mei
    • 1
  • Jianfei Cai
    • 1
  • Jianmin Zheng
    • 1
  1. 1.Nanyang Technological UniversitySingaporeSingapore

Personalised recommendations