Journal of Computer Science and Technology

, Volume 32, Issue 1, pp 110–121 | Cite as

Intelligent Visual Media Processing: When Graphics Meets Vision

  • Ming-Ming Cheng
  • Qi-Bin Hou
  • Song-Hai Zhang
  • Paul L. Rosin
Survey

Abstract

The computer graphics and computer vision communities have been working closely together in recent years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media around us. There are three major driving forces behind this phenomenon: 1) the availability of big data from the Internet has created a demand for dealing with the ever-increasing, vast amount of resources; 2) powerful processing tools, such as deep neural networks, provide effective ways for learning how to deal with heterogeneous visual data; 3) new data capture devices, such as the Kinect, the bridge between algorithms for 2D image understanding and 3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey recent research on how computer vision techniques benefit computer graphics techniques and vice versa, and cover research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest possible further research directions.

Keywords

computer graphics computer vision survey scene understanding image manipulation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgments

We would like to thank the anonymous reviewers for their useful feedbacks.

References

  1. [1]
    Lengyel J. The convergence of graphics and vision. Computer, 1998, 31(7): 46-53.CrossRefGoogle Scholar
  2. [2]
    Kang S B. Vision for graphics. In Proc. IJARC/ACCV Joint Int. Symp. Computer Vision, Nov. 2007, pp.23-34.Google Scholar
  3. [3]
    Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R. Real-time human pose recognition in parts from single depth images. Communications of the ACM, 2013, 56(1): 116-124.CrossRefGoogle Scholar
  4. [4]
    Xiao J X. Graphics for vision: Learning to see using big 3D synthetic data. http://www.cs.princeton.edu/~xj/slides/2015_CAD_Graphics Keynote.pdf, Oct. 2016.
  5. [5]
    Zheng S, Prisacariu V A, Averkiou M, Cheng M M, Mitra N J, Shotton J, Torr P H S, Rother C. Object proposals estimation in depth image using compact 3D shape manifolds. In Lecture Notes in Computer Science 9358, Gall J, Gehler P, Leibe B (eds.), Springer International Publishing, 2015, pp.196-208.Google Scholar
  6. [6]
    Meeker M. Internet trends 2014-code conference. http://www.kpcb.com/internet-trends, Oct. 2016.
  7. [7]
    LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444.CrossRefGoogle Scholar
  8. [8]
    Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254-1259.Google Scholar
  9. [9]
    Cheng M M, Mitra N J, Huang X L, Torr P H S, Hu S M. Global contrast based salient region detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(3): 569-582.CrossRefGoogle Scholar
  10. [10]
    Qi W, Cheng M M, Borji A, Lu H C, Bai L F. SaliencyRank: Two-stage manifold ranking for salient object detection. Computational Visual Media, 2015, 1(4): 309-320.CrossRefGoogle Scholar
  11. [11]
    Wu X M, Du M N, Chen W H, Wang J H. Salient object detection via region contrast and graph regularization. Science China Information Sciences, 2016, 59: 032104.CrossRefGoogle Scholar
  12. [12]
    Zhang W, Borji A,Wang Z, Le Callet P, Liu H T. The application of visual saliency models in objective image quality assessment: A statistical evaluation. IEEE Trans. Neural Networks and Learning Systems, 2016, 27(6): 1266-1278.Google Scholar
  13. [13]
    Borji A, Cheng M M, Jiang H Z, Li J. Salient object detection: A benchmark. IIEEE Transactions on Image Processing, 2015, 24(12): 5706-5722.CrossRefMathSciNetGoogle Scholar
  14. [14]
    Borji A, Cheng M M, Jiang H Z, Li J. Salient object detection: A survey. arXiv:1411.5878, 2014. https://arxiv.org/abs/1411.5878, Nov. 2016.
  15. [15]
    Zhang G X, Cheng M M, Hu S M, Martin R R. A shapepreserving approach to image resizing. Computer Graphics Forum, 2009, 28(7): 1897-1906.CrossRefGoogle Scholar
  16. [16]
    Zhao Y T, Liu Y H. Patch based saliency detection method for 3D surface simplification. In Proc. the 21st International Conference on Pattern Recognition, Nov. 2012, pp.845-848.Google Scholar
  17. [17]
    Jänicke H, Chen M. A salience-based quality metric for visualization. Computer Graphics Forum, 2010, 29(3): 1183-1192.Google Scholar
  18. [18]
    Miao Y W, Feng J Q, Wang J R, Pajarola R. A multichannel salience based detail exaggeration technique for 3D relief surfaces. Journal of Computer Science and Technology, 2012, 27(6): 1100-1109.CrossRefGoogle Scholar
  19. [19]
    Avidan S, Shamir A. Seam carving for content-aware image resizing. ACM Transactions on Graphics, 2007, 26(3): Article No. 10.Google Scholar
  20. [20]
    Wang Y S, Tai C L, Sorkine O, Lee T Y. Optimized scale-and-stretch for image resizing. ACM Transactions on Graphics, 2008, 27(5): Article No. 118.Google Scholar
  21. [21]
    Lee C H, Varshney A, Jacobs D W. Mesh saliency. ACM Transactions on Graphics, 2005, 24(3): 659-666.CrossRefGoogle Scholar
  22. [22]
    Kim Y, Varshney A. Saliency-guided enhancement for volume visualization. IEEE Transactions on Visualization and Computer Graphics, 2006, 12(5): 925-932.CrossRefGoogle Scholar
  23. [23]
    Zhang L M, Wang M, Nie L Q, Hong L, Rui Y, Tian Q. Retargeting semantically-rich photos. IEEE Transactions on Multimedia, 2015, 17(9): 1538-1549.CrossRefGoogle Scholar
  24. [24]
    Wu H S, Wang Y S, Feng K C, Wong T T, Lee T Y, Heng P A. Resizing by symmetry-summarization. ACM Transactions on Graphics, 2010, 29(6): Article No. 159.Google Scholar
  25. [25]
    Zhang F, Zhang X, Qin X Y, Zhang C M. Enlarging image by constrained least square approach with shape preserving. Journal of Computer Science and Technology, 2015, 30(3): 489-498.CrossRefMathSciNetGoogle Scholar
  26. [26]
    Li B, Duan L Y, Lin C W, Huang T J, Gao W. Depthpreserving warping for stereo image retargeting. IEEE Transactions on Image Processing, 2015, 24(9): 2811-2826.CrossRefMathSciNetGoogle Scholar
  27. [27]
    Jain E, Sheikh Y, Shamir A, Hodgins J. Gaze-driven video re-editing. ACM Trans. Graphics, 2015, 34(2): Article No. 21.Google Scholar
  28. [28]
    Liu Y, Sun L F, Yang S Q. A retargeting method for stereoscopic 3D video. Computational Visual Media, 2015, 1(2): 119-127.CrossRefGoogle Scholar
  29. [29]
    Miao Y W, Lin H B. Visual saliency guided global and local resizing for 3D models. In Proc. Int. Conf. Computer-Aided Design and Computer Graphics, Nov. 2013, pp.212-219.Google Scholar
  30. [30]
    Jia S X, Zhang C M, Li X M, Zhou Y F. Mesh resizing based on hierarchical saliency detection. Graphical Models, 2014, 76(5): 355-362.CrossRefGoogle Scholar
  31. [31]
    Song R, Liu Y H, Zhao Y T, Martin R R, Rosin P L. Conditional random field-based mesh saliency. In Proc. the 19th IEEE International Conference on Image Processing, Sept. 30-Oct. 3, 2012, pp.637-640.Google Scholar
  32. [32]
    Castelló P, Chover M, Sbert M, Feixas M. Reducing complexity in polygonal meshes with view-based saliency. Computer Aided Geometric Design, 2014, 31(6): 279-293.Google Scholar
  33. [33]
    Miao Y W, Feng J Q, Pajarola R. Visual saliency guided normal enhancement technique for 3D shape depiction. Computers & Graphics, 2011, 35(3): 706-712.CrossRefGoogle Scholar
  34. [34]
    Zhao Y, Lu S J, Qian H L, Yao P C. Robust mesh deformation with salient features preservation. Science China Information Sciences, 2016, 59: 052106.CrossRefGoogle Scholar
  35. [35]
    Semmo A, Trapp M, Kyprianidis J E, Döllner J. Interactive visualization of generalized virtual 3D city models using level-of-abstraction transitions. Computer Graphics Forum, 2012, 31: 885-894.Google Scholar
  36. [36]
    Song P, Fu Z Q, Liu L G, Fu C W. Printing 3D objects with interlocking parts. Computer Aided Geometric Design, 2015, 35/36: 137-148.Google Scholar
  37. [37]
    Wang W M, Chao H Y, Tong J et al. Saliency-preserving slicing optimization for effective 3D printing. Computer Graphics Forum, 2015, 34(6): 148-160.CrossRefGoogle Scholar
  38. [38]
    Criminisi A, Pérez P, Toyama K. Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 2004, 13(9): 1200-1212.Google Scholar
  39. [39]
    Adams A, Gelfand N, Dolson J, Levoy M. Gaussian KD-trees for fast high-dimensional filtering. ACM Transactions on Graphics, 2009, 28(3): Article No. 21.Google Scholar
  40. [40]
    Simakov D, Caspi Y, Shechtman E, Irani M. Summarizing visual data using bidirectional similarity. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2008.Google Scholar
  41. [41]
    Shamir A, Avidan S. Seam carving for media retargeting. Communications of the ACM, 2009, 52(1): 77-85.CrossRefGoogle Scholar
  42. [42]
    Chen T, Zhu Z, Shamir A, Hu S M, Cohen-Or D. 3-sweep: Extracting editable objects from a single photo. ACM Trans. Graphics, 2013, 32(6): Article No. 195.Google Scholar
  43. [43]
    Kholgade N, Simon T, Efros A, Sheikh Y. 3D object manipulation in a single photograph using stock 3D models. ACM Trans. Graphics, 2014, 33(4): Article No. 127.Google Scholar
  44. [44]
    Koka K. Principles of Gestalt Psychology (Reprint Edition). Routledge, 2013.Google Scholar
  45. [45]
    Cheng M M, Zhang F L, Mitra N J, Huang X L, Hu S M. RepFinder: Finding approximately repeated scene elements for image editing. ACM Transactions on Graphics, 2010, 29(4): Article No. 83.Google Scholar
  46. [46]
    Goldberg C, Chen T, Zhang F L, Shamir A, Hu S M. Datadriven object manipulation in images. Computer Graphics Forum, 2012, 31: 265-274.CrossRefGoogle Scholar
  47. [47]
    Chen T, Cheng M M, Tan P, Shamir A, Hu S M. Sketch2Photo: Internet image montage. ACM Transactions on Graphics, 2009, 28(5): Article No. 124.Google Scholar
  48. [48]
    Lu S P, Zhang S H, Wei J, Hu S M, Martin R R. Timeline editing of objects in video. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(7): 1218-1227.CrossRefGoogle Scholar
  49. [49]
    Zheng Y Y, Chen X, Cheng M M et al. Interactive images: Cuboid proxies for smart image manipulation. ACM Trans. Graphics, 2012, 31(4): Article No. 99.Google Scholar
  50. [50]
    Iizuka S, Endo Y, Hirose M, Kanamori Y, Mitani J, Fukui Y. Object repositioning based on the perspective in a single image. Computer Graphics Forum, 2014, 33(8): 157-166.CrossRefGoogle Scholar
  51. [51]
    Rong Y L, Zheng Y Y, Shao T J et al. An interactive approach for functional prototype recovery from a single RGBD image. Computational Visual Media, 2016, 2(1): 87-96.CrossRefGoogle Scholar
  52. [52]
    Wu J, Rosin P L, Sun X F, Martin R R. Improving shape from shading with interactive Tabu search. Journal of Computer Science and Technology, 2016, 31(3): 450-462.CrossRefGoogle Scholar
  53. [53]
    Zhao H L, Nie G Z, Li X J et al. Structure-aware nonlocal optimization framework for image colorization. Journal of Computer Science and Technology, 2015, 30(3): 478-488.CrossRefGoogle Scholar
  54. [54]
    Cheng M M, Prisacariu V A, Zheng S, Torr P H S, Rother C. DenseCut: Densely connected CRFs for realtime Grab-Cut. Computer Graphics Forum, 2015, 34(7): 193-201.CrossRefGoogle Scholar
  55. [55]
    Cheng M M, Zheng S, Lin W Y, Vineet V, Sturgess P, Crook N, Mitra N J, Torr P. ImageSpirit: Verbal guided image parsing. ACM Trans. Graphics, 2014, 34(1): Article No. 3.Google Scholar
  56. [56]
    Huang Q X, Wang H, Koltun V. Single-view reconstruction via joint analysis of image and shape collections. ACM Transactions on Graphics, 2015, 34(4): Article No. 87.Google Scholar
  57. [57]
    Chen T, Tan P, Ma L Q, Cheng M M, Shamir A, Hu S M. Poseshop: Human image database construction and personalized content synthesis. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(5): 824-837.CrossRefGoogle Scholar
  58. [58]
    Tanahashi Y, Hsueh C H, Ma K L. An efficient framework for generating storyline visualizations from streaming data. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(6): 730-742.CrossRefGoogle Scholar
  59. [59]
    Hasegawa K, Saito H. Synthesis of a stroboscopic image from a hand-held camera sequence for a sports analysis. Computational Visual Media, 2016, 2(3): 277-289.CrossRefGoogle Scholar
  60. [60]
    Lalonde J F, Hoiem D, Efros A A, Rother C, Winn J, Criminisi A. Photo clip art. ACM Transactions on Graphics, 2007, 26(3): Article No. 3.Google Scholar
  61. [61]
    Xu K, Chen K, Fu H B, Sun W L, Hu S M. Sketch2Scene: Sketch-based co-retrieval and co-placement of 3D models. ACM Trans. Graphics, 2013, 32(4): Article No. 123.Google Scholar
  62. [62]
    Chia A Y S, Zhuo S J, Gupta R K, Tai Y W, Cho S Y, Tan P, Lin S. Semantic colorization with Internet images. ACM Transactions on Graphics, 2011, 30(6): Article No. 156.Google Scholar
  63. [63]
    Longuet-Higgins H C. A computer algorithm for reconstructing a scene from two projections. In Readings in Computer Vision: Issues, Problems, Principles, and Paradigms, Fischler M A, Firschein O (eds.), Morgan Kaufmann Publishers Inc., 1987, pp.61-62.Google Scholar
  64. [64]
    Snavely N, Seitz S M, Szeliski R. Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics, 2006, 25(3): 835-846.CrossRefGoogle Scholar
  65. [65]
    Agarwal S, Snavely N, Simon I, Seitz S M, Szeliski R. Building Rome in a day. In Proc. the 12th International Conference on Computer Vision, Sept. 29-Oct. 2, 2009, pp.72-79.Google Scholar
  66. [66]
    Cao C, Bradley D, Zhou K, Beeler T. Realtime high-fidelity facial performance capture. ACM Transactions on Graphics, 2015, 34(4): Article No. 46.Google Scholar
  67. [67]
    Frahm J M, Fite-Georgel P, Gallup D, Johnson T, Raguram R, Wu C C, Jen Y H, Dunn E, Clipp B, Lazebnik S, Pollefeys M. Building Rome on a cloudless day. In Lecture Notes in Computer Science 6314, Daniilidis K, Maragos P, Paragios N (eds.), Springer-Verlag, 2010, pp.368-381.Google Scholar
  68. [68]
    Fuhrmann S, Langguth F, Moehrle N, Waechter M, Goesele M. MVE — An imagebased reconstruction environment. Computers & Graphics, 2015, 53: 44-53.CrossRefGoogle Scholar
  69. [69]
    Ceylan D, Mitra N J, Zheng Y Y, Pauly M. Coupled structure-from-motion and 3D symmetry detection for urban facades. ACM Trans. Graphics, 2014, 33(1): Article No. 2.Google Scholar
  70. [70]
    Kopf J, Cohen M F, Szeliski R. Firstperson hyper-lapse videos. ACM Trans. Graphics, 2014, 33(4): Article No. 78.Google Scholar
  71. [71]
    Tan W, Liu H M, Dong Z L, Zhang G F, Bao H J. Robust monocular SLAM in dynamic environments. In Proc. Int. Sym. Mixed and Augmented Reality, Oct. 2013, pp.209-218.Google Scholar
  72. [72]
    Li K, Yang J Y, Jiang J M. Nonrigid structure from motion via sparse representation. In Proc. International Conference on Multimedia and Expo, July 2014.Google Scholar
  73. [73]
    Li K, Yang J, Jiang J. Nonrigid structure from motion via sparse representation. IEEE Trans. Cybernetics, 2015, 45(8): 1401-1413.CrossRefGoogle Scholar
  74. [74]
    Huang H D, Chai J X, Tong X, Wu H T. Leveraging motion capture and 3D scanning for high-fidelity facial performance acquisition. ACM Transactions on Graphics, 2011, 30(4): Article No. 74.Google Scholar
  75. [75]
    Zhang L, Snavely N, Curless B, Seitz S M. Spacetime faces: High-resolution capture for modeling and animation. In Data-Driven 3D Facial Animation, Deng Z G, Neumann U (eds.), Springer, 2008, pp.248-276.Google Scholar
  76. [76]
    Beeler T, Hahn F, Bradley D, Bickel B, Beardsley P, Gotsman C, Sumner R W, Gross M. High-quality passive facial performance capture using anchor frames. ACM Transactions on Graphics, 2011, 30(4): Article No. 75.Google Scholar
  77. [77]
    Chen K, Lai Y K, Hu S M. 3D indoor scene modeling from RGB-D data: A survey. Computational Visual Media, 2015, 1(4): 267-278.CrossRefGoogle Scholar
  78. [78]
    Cao C, Hou Q M, Zhou K. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Transactions on Graphics, 2014, 33(4): Article No. 43.Google Scholar
  79. [79]
    Casas D, Tejera M, Guillemaut J Y, Hilton A. Interactive animation of 4D performance capture. IEEE Trans. Visualization and Computer Graphics, 2013, 19(5): 762-773.CrossRefGoogle Scholar
  80. [80]
    Huang P, TejeraM, Collomosse J, Hilton A. Hybrid skeletalsurface motion graphs for character animation from 4D performance capture. ACM Transactions on Graphics, 2015, 34(2): Article No. 17.Google Scholar
  81. [81]
    Xia S H, Wang C Y, Chai J X, Hodgins J. Realtime style transfer for unlabeled heterogeneous human motion. ACM Transactions on Graphics, 2015, 34(4): Article No. 119.Google Scholar
  82. [82]
    Pons-Moll G, Romero J, Mahmood N, Black M J. Dyna: A model of dynamic human shape in motion. ACM Transactions on Graphics, 2015, 34(4): Article No. 120.Google Scholar
  83. [83]
    Rogez G, Schmid C. MoCap-guided data augmentation for 3D pose estimation in the wild. arXiv:1607.02046, 2016. https://arxiv.org/abs/1607.02046, Oct. 2016.
  84. [84]
    Shotton J, Girshick R, Fitzgibbon A, Sharp T, Cook M, Finocchio M, Moore R, Kohli P, Criminisi A, Kipman A, Blake A. Efficient human pose estimation from single depth images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(12): 2821-2840.CrossRefGoogle Scholar
  85. [85]
    Song S R, Xiao J X. Sliding shapes for 3D object detection in depth images. In Lecture Notes in Computer Science 8694, Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds.), Springer International Publishing, 2014, pp.634-651.Google Scholar
  86. [86]
    Malisiewicz T, Gupta A, Efros A A. Ensemble of exemplar-SVMs for object detection and beyond. In Proc. Int. Conf. Computer Vision, Nov. 2011, pp.89-96.Google Scholar
  87. [87]
    Peng X C, Sun B C, Ali K, Saenko K. Learning deep object detectors from 3D models. In Proc. International Conference on Computer Vision, Dec. 2015, pp.1278-1286.Google Scholar
  88. [88]
    Gupta S, Girshick R, Arbeláez P, Malik J. Learning rich features from RGB-D images for object detection and segmentation. In Lecture Notes in Computer Science 8695, Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds.), Springer International Publishing, 2014, pp.345-360.Google Scholar
  89. [89]
    Wu Z R, Song S R, Khosla A, Yu F, Zhang L G, Tang X O, Xiao J X. 3D ShapeNets: A deep representation for volumetric shapes. In Proc. Conference on Computer Vision and Pattern Recognition, June 2015, pp.1912-1920.Google Scholar
  90. [90]
    Maturana D, Scherer S. VoxNet: A 3D convolutional neural network for real-time object recognition. In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems, Sept. 28-Oct. 2, 2015, pp.922-928.Google Scholar
  91. [91]
    Wohlhart P, Lepetit V. Learning descriptors for object recognition and 3D pose estimation. In Proc. Conf. Computer Vision and Pattern Recognition, June 2015, pp.3109-3118.Google Scholar
  92. [92]
    Valentin J, Vineet V, Cheng M M, Kim D, Shotton J, Kohli P, Nieβner M, Criminisi A, Izadi S, Torr P. Semantic-Paint: Interactive 3D labeling and learning at your fingertips. ACM Trans. Graphics, 2015, 34(5): Article No. 154.Google Scholar
  93. [93]
    Xu K, Huang H, Shi Y F, Li H, Long P X, Caichen J, Sun W, Chen B Q. Autoscanning for coupled scene reconstruction and proactive object analysis. ACM Transactions on Graphics, 2015, 34(6): Article No. 177.Google Scholar
  94. [94]
    Tateno K, Tombari F, Navab N. When 2.5D is not enough: Simultaneous reconstruction, segmentation and recognition on dense SLAM. In Proc. International Conference on Robotics and Automation, May 2016, pp.2295-2302.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Ming-Ming Cheng
    • 1
  • Qi-Bin Hou
    • 1
  • Song-Hai Zhang
    • 2
  • Paul L. Rosin
    • 1
    • 3
  1. 1.College of Computer Science and Control EngineeringNankai UniversityTianjinChina
  2. 2.TNList, Tsinghua UniversityBeijingChina
  3. 3.School of Computer Science and InformaticsCardiff UniversityWalesU.K.

Personalised recommendations