Title: Multimodal transformer-driven consistent environment design generation simulation modelling
Authors: Zhuo Fan; Jinqi Wang
Addresses: College of Art and Design, NanNing University, Nanning, 530200, China ' College of Art and Design, NanNing University, Nanning, 530200, China
Abstract: Automated generation of physically plausible 3D environments is a key challenge in digital twins, the metaverse, and robot simulation. Current methods focus mainly on visual fidelity, often overlooking functional and physical rationality, limiting direct applicability to simulation tasks. To address this, we propose a multimodal transformer-based framework for environment design generation and consistency simulation. Utilising a cross-modal attention mechanism, our model integrates textual descriptions with prior knowledge from real 3D scenes. It incorporates fine-grained physical constraint losses - including collision avoidance, support relations, and spatial accessibility optimisation - during training to explicitly model physical consistency. Experiments on the Matterport3D dataset show our method outperforms existing baselines in visual quality and layout rationality. Notably, it shows significant gains in physical consistency: collision volume is greatly reduced, and navigation success reaches 89%, affirming high simulability and practicality of the generated environments.
Keywords: multimodal transformer; environment design generation; physical consistency; simulation modelling; microphysically constrainable.
DOI: 10.1504/IJSPM.2026.152091
International Journal of Simulation and Process Modelling, 2026 Vol.23 No.1, pp.34 - 43
Received: 14 Oct 2025
Accepted: 19 Nov 2025
Published online: 06 Mar 2026 *


