Skip to main content

A Refined Spatial Transformer Network

  • Conference paper
  • First Online:
Book cover Neural Information Processing (ICONIP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Included in the following conference series:

Abstract

Spatial invariance to geometrically distorted data is of great importance in the vision and learning communities. Spatial transformer network (STN) can solve this problem in a computationally efficient manner. STN is a differentiable module which can be inserted in a standard CNN architecture to achieve spatial transformation of data. STN and its variants can handle global deformation well, but lack the ability to deal with local spatial variation. Hence how to achieve a better manner of spatial transformation within a neural network becomes a pressing matter of the moment. To address this issue, we design a module to estimate the difference between the ground truth and STN output. The difference is measured in the form of motion field. The motion field is utilized to refine the spatial transformation predicted by STN. Experimental results reveal that our method outperforms the state-of-the-art methods in the cluttered MNIST handwritten digits classification task and planar image alignment task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://s3.amazonaws.com/lasagne/recipes/datasets/mnist_cluttered_60x60_6distortions.npz.

References

  1. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features. Comput. Vis. Image Underst. 110(3), 404–417 (2008)

    Article  Google Scholar 

  2. Bhagavatula, C., Zhu, C., Luu, K., Savvides, M.: Faster than real-time facial alignment: a 3d spatial transformer network approach in unconstrained poses. In: The IEEE International Conference on Computer Vision, vol. 2, p. 7 (2017)

    Google Scholar 

  3. Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)

    Article  Google Scholar 

  4. Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24673-2_3

    Chapter  Google Scholar 

  5. Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 500 (2011)

    Article  Google Scholar 

  6. Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)

    Article  Google Scholar 

  7. Chang, C.H., Chou, C.N., Chang, E.Y.: CLKN: Cascaded Lucas-Kanade Networks for image alignment. In: The IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  8. De Boor, C.: A Practical Guide to Splines. Springer, Heidelberg (1978)

    Book  Google Scholar 

  9. Galway, L.: Spline Models for Observational Data. Siam, Philadelphia (1990)

    Google Scholar 

  10. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems. pp. 2017–2025 (2015)

    Google Scholar 

  11. Kanazawa, A., Sharma, A., Jacobs, D.: Locally scale-invariant convolutional neural networks. Comput. Sci. (2014)

    Google Scholar 

  12. Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)

    Google Scholar 

  13. Lecun, Y., Cortes, C.: The MNIST database of handwritten digits (2010)

    Google Scholar 

  14. Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: The International Conference on Computer Vision, pp. 2548–2555 (2011)

    Google Scholar 

  15. Lin, C., Lucey, S.: Inverse compositional spatial transformer networks. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 2252–2260 (2017)

    Google Scholar 

  16. Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)

    Article  Google Scholar 

  17. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  MathSciNet  Google Scholar 

  18. Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)

    Article  Google Scholar 

  19. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  20. Simard, P.Y., Steinkraus, D., Platt, J.C., et al.: Best practices for convolutional neural networks applied to visual document analysis. In: International Conference on Document Analysis and Recognition, vol. 3, pp. 958–962 (2003)

    Google Scholar 

  21. Sohn, K., Lee, H.: Learning invariant representations with local transformations. In: International Conference on Machine Learning, pp. 1339–1346 (2012)

    Google Scholar 

  22. Stollenga, M.F., Masci, J., Gomez, F., Schmidhuber, J.: Deep networks with internal selective attention through feedback connections. In: Advances in Neural Information Processing Systems, pp. 3545–3553 (2014)

    Google Scholar 

  23. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)

    Google Scholar 

  24. Wu, W., Kan, M., Liu, X., Yang, Y., Shan, S., Chen, X.: Recursive spatial transformer (rest) for alignment-free face recognition. In: The IEEE International Conference on Computer Vision, pp. 3792–3800 (2017)

    Google Scholar 

  25. Zhang, H., He, X.: Deep free-form deformation network for object-mask registration. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 4261–4269 (2017)

    Google Scholar 

Download references

Acknowledgments

This paper is supported by National Science Foundation of China (No. 61673381, 61201050, 61701497), Scientific Instrument Developing Project of Chinese Academy of Sciences (No. YZ201671), Bureau of International Cooperation, CAS (No. 153D31KYSB20170059), and Special Program of Beijing Municipal Science & Technology Commission (No. Z161100000216146).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hua Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shu, C., Chen, X., Yu, C., Han, H. (2018). A Refined Spatial Transformer Network. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04182-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04181-6

  • Online ISBN: 978-3-030-04182-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics