Skip to main content

TO-Scene: A Large-Scale Dataset for Understanding 3D Tabletop Scenes

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13687))

Included in the following conference series:

Abstract

Many basic indoor activities such as eating or writing are always conducted upon different tabletops (e.g., coffee tables, writing desks). It is indispensable to understanding tabletop scenes in 3D indoor scene parsing applications. Unfortunately, it is hard to meet this demand by directly deploying data-driven algorithms, since 3D tabletop scenes are rarely available in current datasets. To remedy this defect, we introduce TO-Scene, a large-scale dataset focusing on tabletop scenes, which contains 20,740 scenes with three variants. To acquire the data, we design an efficient and scalable framework, where a crowdsourcing UI is developed to transfer CAD objects from ModelNet [43] and ShapeNet [5] onto tables from ScanNet [10], then the output tabletop scenes are simulated into real scans and annotated automatically.

Further, a tabletop-aware learning strategy is proposed for better perceiving the small-sized tabletop instances. Notably, we also provide a real scanned test set TO-Real to verify the practical value of TO-Scene. Experiments show that the algorithms trained on TO-Scene indeed work on the realistic test data, and our proposed tabletop-aware learning strategy greatly improves the state-of-the-art results on both 3D semantic segmentation and object detection tasks. Dataset and code are available at https://github.com/GAP-LAB-CUHK-SZ/TO-Scene.

M. Xu and P. Chen—Contribute equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmadyan, A., Zhang, L., Ablavatski, A., Wei, J., Grundmann, M.: Objectron: A large scale dataset of object-centric videos in the wild with pose annotations. In: CVPR (2021)

    Google Scholar 

  2. Armeni, I., et al.: 3D semantic parsing of large-scale indoor spaces. In: CVPR (2016)

    Google Scholar 

  3. Avetisyan, A., Dahnert, M., Dai, A., Savva, M., Chang, A.X., Niessner, M.: Scan2CAD: learning cad model alignment in RGB-D scans. In: CVPR (2019)

    Google Scholar 

  4. Chang, A., et al.: Matterport3D: learning from RGB-D data in indoor environments. In: 3DV (2017)

    Google Scholar 

  5. Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)

  6. Choi, S., Zhou, Q.Y., Miller, S., Koltun, V.: A large dataset of object scans. arXiv preprint arXiv:1602.02481 (2016)

  7. Choy, C., Gwak, J., Savarese, S.: 4D spatio-temporal convnets: Minkowski convolutional neural networks. In: CVPR (2019)

    Google Scholar 

  8. Collins, J., et al.: ABO: dataset and benchmarks for real-world 3D object understanding. arXiv preprint arXiv:2110.06199 (2021)

  9. Blender Online Community: Blender - a 3D modelling and rendering package. Blender Foundation, Stichting Blender Foundation, Amsterdam (2018). http://www.blender.org

  10. Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: ScanNet: richly-annotated 3D reconstructions of indoor scenes. In: CVPR (2017)

    Google Scholar 

  11. Denninger, M., et al.: BlenderProc. arXiv preprint arXiv:1911.01911 (2019)

  12. Depierre, A., Dellandréa, E., Chen, L.: Jacquard: a large scale dataset for robotic grasp detection. In: IROS (2018)

    Google Scholar 

  13. Fang, H.S., Wang, C., Gou, M., Lu, C.: GraspNet-1billion: a large-scale benchmark for general object grasping. In: CVPR (2020)

    Google Scholar 

  14. Fu, H., et al.: 3D-front: 3D furnished rooms with layouts and semantics. In: CVPR (2021)

    Google Scholar 

  15. Garcia-Garcia, A., et al.: The RobotriX: an extremely photorealistic and very-large-scale indoor dataset of sequences with robot trajectories and interactions. In: IROS (2018)

    Google Scholar 

  16. Georgakis, G., Reza, M.A., Mousavian, A., Le, P., Kosecka, J.: Multiview RGB-D dataset for object instance detection. In: 3DV (2016)

    Google Scholar 

  17. Girardeau-Montaut, D.: CloudCompare3D point cloud and mesh processing software. OpenSource Project (2011)

    Google Scholar 

  18. Graham, B., Engelcke, M., van der Maaten, L.: 3D semantic segmentation with submanifold sparse convolutional networks. In: CVPR (2018)

    Google Scholar 

  19. Handa, A., Patraucean, V., Badrinarayanan, V., Stent, S., Cipolla, R.: SceneNet: understanding real world indoor scenes with synthetic data. In: CVPR (2016)

    Google Scholar 

  20. Hu, W., Zhao, H., Jiang, L., Jia, J., Wong, T.T.: Bidirectional projection network for cross dimensional scene understanding. In: CVPR (2021)

    Google Scholar 

  21. Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., Yeung, S.K.: SceneNN: a scene meshes dataset with annotations. In: 3DV (2016)

    Google Scholar 

  22. Koch, S., et al.: ABC: a big cad model dataset for geometric deep learning. In: CVPR (2019)

    Google Scholar 

  23. Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: ICRA (2011)

    Google Scholar 

  24. Lenz, I., Lee, H., Saxena, A.: Deep learning for detecting robotic grasps. IJRR 34, 705–724 (2015)

    Google Scholar 

  25. Li, Z., et al.: OpenRooms: an open framework for photorealistic indoor scene datasets. In: CVPR (2021)

    Google Scholar 

  26. Liu, Z., Hu, H., Cao, Y., Zhang, Z., Tong, X.: A closer look at local aggregation operators in point cloud analysis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12368, pp. 326–342. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58592-1_20

    Chapter  Google Scholar 

  27. Liu, Z., Zhang, Z., Cao, Y., Hu, H., Tong, X.: Group-free 3D object detection via transformers. In: ICCV (2021)

    Google Scholar 

  28. Miller, G.A.: WordNet: a lexical database for English. ACM Commun. 38, 39–41 (1995)

    Google Scholar 

  29. Park, K., Rematas, K., Farhadi, A., Seitz, S.M.: PhotoShape: photorealistic materials for large-scale shape collections. ACM Trans. Graph 37, 1–12 (2018)

    Google Scholar 

  30. Qi, C.R., Litany, O., He, K., Guibas, L.J.: Deep hough voting for 3D object detection in point clouds. In: ICCV (2019)

    Google Scholar 

  31. Qi, C.R., Su, H., Mo, K., Guibas, L.J.: PointNet: deep learning on point sets for 3D classification and segmentation. In: CVPR (2017)

    Google Scholar 

  32. Qi, C.R., Yi, L., Su, H., Guibas, L.J.: PointNet++: deep hierarchical feature learning on point sets in a metric space. In: NeurIPS (2017)

    Google Scholar 

  33. Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: ICCV (2021)

    Google Scholar 

  34. Savva, M., Chang, A.X., Hanrahan, P., Fisher, M., Nießner, M.: PiGraphs: learning Interaction Snapshots from Observations. ACM Trans. Graph 35, 1–12 (2016)

    Google Scholar 

  35. Silberman, N., Fergus, R.: Indoor scene segmentation using a structured light sensor. In: ICCV Workshops (2011)

    Google Scholar 

  36. Silberman, N., Hoiem, D., Kohli, P., Fergus, R.: Indoor segmentation and support inference from RGBD images. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7576, pp. 746–760. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33715-4_54

    Chapter  Google Scholar 

  37. Singh, A., Sha, J., Narayan, K.S., Achim, T., Abbeel, P.: BigBIRD: a large-scale 3D database of object instances. In: ICRA (2014)

    Google Scholar 

  38. Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: CVPR (2015)

    Google Scholar 

  39. Song, S., Yu, F., Zeng, A., Chang, A.X., Savva, M., Funkhouser, T.: Semantic scene completion from a single depth image. In: CVPR (2017)

    Google Scholar 

  40. Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: KPConv: flexible and deformable convolution for point clouds. In: ICCV (2019)

    Google Scholar 

  41. Torralba, A., Russell, B.C., Yuen, J.: LabeLMe: online image annotation and applications. IJCV (2010)

    Google Scholar 

  42. Uy, M.A., Pham, Q.H., Hua, B.S., Nguyen, D.T., Yeung, S.K.: Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In: ICCV (2019)

    Google Scholar 

  43. Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: CVPR (2015)

    Google Scholar 

  44. Xiang, Yu., et al.: ObjectNet3D: a large scale database for 3D object recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 160–176. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_10

    Chapter  Google Scholar 

  45. Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: WACV (2014)

    Google Scholar 

  46. Xiao, J., Owens, A., Torralba, A.: Sun3D: a database of big spaces reconstructed using SFM and object labels. In: ICCV (2013)

    Google Scholar 

  47. Xu, M., Ding, R., Zhao, H., Qi, X.: PAConv: position adaptive convolution with dynamic kernel assembling on point clouds. In: CVPR (2021)

    Google Scholar 

  48. Xu, M., Zhang, J., Zhou, Z., Xu, M., Qi, X., Qiao, Y.: Learning geometry-disentangled representation for complementary understanding of 3D object point cloud. In: AAAI (2021)

    Google Scholar 

  49. Zeng, A., Song, S., Nießner, M., Fisher, M., Xiao, J., Funkhouser, T.: 3DMatch: learning local geometric descriptors from RGB-D reconstructions. In: CVPR (2017)

    Google Scholar 

  50. Zhang, Z., Sun, B., Yang, H., Huang, Q.: H3DNet: 3D object detection using hybrid geometric primitives. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 311–329. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_19

    Chapter  Google Scholar 

  51. Zhang, Z.: Microsoft Kinect sensor and its effect. IEEE MultiMedia (2012)

    Google Scholar 

  52. Zhao, H., Jiang, L., Jia, J., Torr, P., Koltun, V.: Point transformer. In: ICCV (2021)

    Google Scholar 

Download references

Acknowledgements

The work was supported in part by the National Key R &D Program of China with grant No. 2018YFB1800800, the Basic Research Project No. HZQB-KCZYZ-2021067 of Hetao Shenzhen-HK S &T Cooperation Zone, by Shenzhen Outstanding Talents Training Fund 202002, by Guangdong Research Projects No. 2017ZT07X152 and No. 2019CX01X104, and by the Guangdong Provincial Key Laboratory of Future Networks of Intelligence (Grant No. 2022B1212010001). It was also supported by NSFC-62172348, NSFC-61902334 and Shenzhen General Project (No. JCYJ20190814112007258). We also thank the High-Performance Computing Portal under the administration of the Information Technology Services Office at CUHK-Shenzhen.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoguang Han .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15780 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, M., Chen, P., Liu, H., Han, X. (2022). TO-Scene: A Large-Scale Dataset for Understanding 3D Tabletop Scenes. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13687. Springer, Cham. https://doi.org/10.1007/978-3-031-19812-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19812-0_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19811-3

  • Online ISBN: 978-3-031-19812-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics