A Refined Spatial Transformer Network

Shu, Chang; Chen, Xi; Yu, Chong; Han, Hua

doi:10.1007/978-3-030-04182-3_14

Chang Shu^16,17,
Xi Chen¹⁷,
Chong Yu^17,18 &
…
Hua Han^17,19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11303))

Included in the following conference series:

International Conference on Neural Information Processing

2259 Accesses
1 Citations

Abstract

Spatial invariance to geometrically distorted data is of great importance in the vision and learning communities. Spatial transformer network (STN) can solve this problem in a computationally efficient manner. STN is a differentiable module which can be inserted in a standard CNN architecture to achieve spatial transformation of data. STN and its variants can handle global deformation well, but lack the ability to deal with local spatial variation. Hence how to achieve a better manner of spatial transformation within a neural network becomes a pressing matter of the moment. To address this issue, we design a module to estimate the difference between the ground truth and STN output. The difference is measured in the form of motion field. The motion field is utilized to refine the spatial transformation predicted by STN. Experimental results reveal that our method outperforms the state-of-the-art methods in the cluttered MNIST handwritten digits classification task and planar image alignment task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://s3.amazonaws.com/lasagne/recipes/datasets/mnist_cluttered_60x60_6distortions.npz.

References

Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features. Comput. Vis. Image Underst. 110(3), 404–417 (2008)
Article Google Scholar
Bhagavatula, C., Zhu, C., Luu, K., Savvides, M.: Faster than real-time facial alignment: a 3d spatial transformer network approach in unconstrained poses. In: The IEEE International Conference on Computer Vision, vol. 2, p. 7 (2017)
Google Scholar
Bookstein, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)
Article Google Scholar
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 25–36. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24673-2_3
Chapter Google Scholar
Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 500 (2011)
Article Google Scholar
Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)
Article Google Scholar
Chang, C.H., Chou, C.N., Chang, E.Y.: CLKN: Cascaded Lucas-Kanade Networks for image alignment. In: The IEEE Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
De Boor, C.: A Practical Guide to Splines. Springer, Heidelberg (1978)
Book Google Scholar
Galway, L.: Spline Models for Observational Data. Siam, Philadelphia (1990)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems. pp. 2017–2025 (2015)
Google Scholar
Kanazawa, A., Sharma, A., Jacobs, D.: Locally scale-invariant convolutional neural networks. Comput. Sci. (2014)
Google Scholar
Kingma, D., Ba, J.: Adam: a method for stochastic optimization. Comput. Sci. (2014)
Google Scholar
Lecun, Y., Cortes, C.: The MNIST database of handwritten digits (2010)
Google Scholar
Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: The International Conference on Computer Vision, pp. 2548–2555 (2011)
Google Scholar
Lin, C., Lucey, S.: Inverse compositional spatial transformer networks. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 2252–2260 (2017)
Google Scholar
Liu, C., Yuen, J., Torralba, A.: Sift flow: dense correspondence across scenes and its applications. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 978–994 (2011)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article MathSciNet Google Scholar
Peng, Y., Ganesh, A., Wright, J., Xu, W., Ma, Y.: RASL: robust alignment by sparse and low-rank decomposition for linearly correlated images. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2233–2246 (2012)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Simard, P.Y., Steinkraus, D., Platt, J.C., et al.: Best practices for convolutional neural networks applied to visual document analysis. In: International Conference on Document Analysis and Recognition, vol. 3, pp. 958–962 (2003)
Google Scholar
Sohn, K., Lee, H.: Learning invariant representations with local transformations. In: International Conference on Machine Learning, pp. 1339–1346 (2012)
Google Scholar
Stollenga, M.F., Masci, J., Gomez, F., Schmidhuber, J.: Deep networks with internal selective attention through feedback connections. In: Advances in Neural Information Processing Systems, pp. 3545–3553 (2014)
Google Scholar
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD birds-200-2011 dataset (2011)
Google Scholar
Wu, W., Kan, M., Liu, X., Yang, Y., Shan, S., Chen, X.: Recursive spatial transformer (rest) for alignment-free face recognition. In: The IEEE International Conference on Computer Vision, pp. 3792–3800 (2017)
Google Scholar
Zhang, H., He, X.: Deep free-form deformation network for object-mask registration. In: The IEEE Conference on Computer Vision and Pattern Recognition, pp. 4261–4269 (2017)
Google Scholar

Download references

Acknowledgments

This paper is supported by National Science Foundation of China (No. 61673381, 61201050, 61701497), Scientific Instrument Developing Project of Chinese Academy of Sciences (No. YZ201671), Bureau of International Cooperation, CAS (No. 153D31KYSB20170059), and Special Program of Beijing Municipal Science & Technology Commission (No. Z161100000216146).

Author information

Authors and Affiliations

University of Chinese Academy of Sciences, Beijing, China
Chang Shu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Chang Shu, Xi Chen, Chong Yu & Hua Han
The Faculty of Mathematics and Statistics, Hubei University, Wuhan, China
Chong Yu
CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China
Hua Han
School of Future Technology, University of Chinese Academy of Sciences, Beijing, China
Hua Han

Authors

Chang Shu
View author publications
You can also search for this author in PubMed Google Scholar
Xi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hua Han
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hua Han .

Editor information

Editors and Affiliations

The Chinese Academy of Sciences, Beijing, China
Long Cheng
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi Sing Leung
Kobe University, Kobe, Japan
Seiichi Ozawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shu, C., Chen, X., Yu, C., Han, H. (2018). A Refined Spatial Transformer Network. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-04182-3_14
Published: 18 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics