Abstract
Recently, the Metaverse is becoming increasingly attractive, with millions of users accessing the many available virtual worlds. However, how do users find the one Metaverse which best fits their current interests? So far, the search process is mostly done by word of mouth, or by advertisement on technology-oriented websites. However, the lack of search engines similar to those available for other multimedia formats (e.g., YouTube for videos) is showing its limitations, since it is often cumbersome to find a Metaverse based on some specific interests using the available methods, while also making it difficult to discover user-created ones which lack strong advertisement. To address this limitation, we propose to use language to naturally describe the desired contents of the Metaverse a user wishes to find. Second, we highlight that, differently from more conventional 3D scenes, Metaverse scenarios represent a more complex data format since they often contain one or more types of multimedia which influence the relevance of the scenario itself to a user query. Therefore, in this work, we create a novel task, called Text-to-Metaverse retrieval, which aims at modeling these aspects while also taking the cross-modal relations with the textual data into account. Since we are the first ones to tackle this problem, we also collect a dataset of 33000 Metaverses, each of which consists of a 3D scene enriched with multimedia content. Finally, we design and implement a deep learning framework based on contrastive learning, resulting in a thorough experimental setup.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdul-Rashid, H., et al.: Shrec’18 track: 2D image-based 3D scene retrieval. Training 700, 70 (2018)
Abdul-Rashid, H., et al.: Shrec’19 track: extended 2D scene image-based 3D scene retrieval. Training (per class) 700, 70 (2019)
Agnusdei, G.P., Elia, V., Gnoni, M.G.: A classification proposal of digital twin applications in the safety domain. Comput. Ind. Eng. 154, 107137 (2021)
Almeida, L.G.G., de Vasconcelos, N.V., Winkler, I., Catapan, M.F.: Innovating industrial training with immersive metaverses: a method for developing cross-platform virtual reality environments. Appl. Sci. 13, 8915 (2023)
Cheng, Y., Zhu, X., Qian, J., Wen, F., Liu, P.: Cross-modal graph matching network for image-text retrieval. ACM Trans. Multimedia Comput. Commun. Appl. 18(4), 1–23 (2022)
Choi, H.S., Kim, S.H.: A content service deployment plan for metaverse museum exhibitions-centering on the combination of beacons and HMDs. Int. J. Inf. Manag. 37(1), 1519–1527 (2017)
Dawson, A., et al.: Data-driven consumer engagement, virtual immersive shopping experiences, and blockchain-based digital assets in the retail metaverse. J. Self-Gov. Manag. Econ. 10(2), 52–66 (2022)
Devlin, J., Chang, M.W., Lee, K.: Google, KT, language, AI: bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171–4186 (2019)
Fu, H., et al.: 3D-front: 3D furnished rooms with layouts and semantics. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10933–10942 (2021)
Ge, Y., et al.: Bridging video-text retrieval with multiple choice questions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16167–16176 (2022)
J., C.: Daily active users (dau) of roblox games worldwide from 4th quarter 2018 to 2nd quarter 2023. Technical report, Statista (2023). https://www.statista.com/statistics/1192573/daily-active-users-global-roblox/
Jin, C., Wu, F., Wang, J., Liu, Y., Guan, Z., Han, Z.: Metamgc: a music generation framework for concerts in metaverse. J. Audio Speech Music Proc. 31 (2022)
Laaki, H., Miche, Y., Tammi, K.: Prototyping a digital twin for real time remote control over mobile networks: Application of remote surgery. IEEE Access 7, 20325–20336 (2019)
Lee, H.K., Park, S., Lee, Y.: A proposal of virtual museum metaverse content for the mz generation. Dig. Creat. 33(2), 79–95 (2022)
Lee, L.H., et al.: All one needs to know about metaverse: a complete survey on technological singularity, virtual ecosystem, and research agenda. arXiv preprint arXiv:2110.05352 (2021)
Liu, Y., et al.: A novel cloud-based framework for the elderly healthcare services using digital twin. IEEE Access 7, 49088–49101 (2019)
Luo, H., et al.: Clip4clip: an empirical study of clip for end to end video clip retrieval and captioning. Neurocomputing 508, 293–304 (2022)
Metaversed: The metaverse reaches 400m monthly active users. Technical report, Metaversed Consulting (2023). https://www.metaversed.consulting/blog/the-metaverse-reaches-400m-active-users
Miech, A., Alayrac, J.B., Smaira, L., Laptev, I., Sivic, J., Zisserman, A.: End-to-end learning of visual representations from uncurated instructional videos. In: Proceedings of IEEE/CVF CVPR, pp. 9879–9889 (2020)
Nguyen, T., Gopalan, N., Patel, R., Corsaro, M., Pavlick, E., Tellex, S.: Robot object retrieval with contextual natural language queries. arXiv preprint arXiv:2006.13253 (2020)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Siyaev, A., Jo, G.S.: Towards aircraft maintenance metaverse using speech interactions with virtual objects in mixed reality. Sensors 21(6), 2066 (2021)
Song, W., Gong, Y., Wang, Y.: VTONShoes: Virtual try-on of shoes in augmented reality on a mobile device. In: IEEE ISMAR, pp. 234–242 (2022)
Wang, H., Bai, X., Yang, M., Zhu, S., Wang, J., Liu, W.: Scene text retrieval via joint text detection and similarity learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4558–4567 (2021)
Wang, J., Chen, S., Liu, Y., Lau, R.: Intelligent metaverse scene content construction. IEEE Access 11, 76222–76241 (2023). https://doi.org/10.1109/ACCESS.2023.3297873
Wang, X., Wang, Y., Shi, Y., Zhang, W., Zheng, Q.: AvatarMeeting: an augmented reality remote interaction system with personalized avatars. In: Proceedings of the 28th ACMMM, pp. 4533–4535 (2020)
Wen, L., Wang, Y., Zhang, D., Chen, G.: Visual matching is enough for scene text retrieval. In: Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pp. 447–455 (2023)
Wu, Y., Chen, K., Zhang, T., Hui, Y., Berg-Kirkpatrick, T., Dubnov, S.: Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Xin, Y., Yang, D., Zou, Y.: Improving text-audio retrieval by text-aware attention pooling and prior matrix revised loss. In: ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE (2023)
Yang, H., et al.: Scene synthesis via uncertainty-driven attribute synchronization. In: Proceedings of the IEEE/CVF ICCV, pp. 5630–5640 (2021)
Yuan, J., Abdul-Rashid, H., Li, B., Lu, Y.: Sketch/image-based 3d scene retrieval: Benchmark, algorithm, evaluation. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 264–269. IEEE (2019)
Zhou, Y., Huang, H., Yuan, S., Zou, H., Xie, L., Yang, J.: Metafi++: Wifi-enabled transformer-based human pose estimation for metaverse avatar simulation. IEEE Internet Things J. 10(16), 14128–14136 (2023). https://doi.org/10.1109/JIOT.2023.3262940
Acknowledgments
This work was supported by the Department Strategic Plan (PSD) of the University of Udine-Interdepartmental Project on Artificial Intelligence (2020-25), MUR Progetti di Ricerca di Rilevante Interesse Nazionale (PRIN) 2022 (project code 2022YTE579), and by TechStar Srl, Italy. Also, we thank Beatrice Portelli for helping with the illustrations and for the useful feedback during the preparation of this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Abdari, A., Falcon, A., Serra, G. (2024). A Language-Based Solution to Enable Metaverse Retrieval. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14556. Springer, Cham. https://doi.org/10.1007/978-3-031-53311-2_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-53311-2_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53310-5
Online ISBN: 978-3-031-53311-2
eBook Packages: Computer ScienceComputer Science (R0)