Skip to main content
Account
Fig. 6 | International Journal of Computer Vision

Fig. 6

From: Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory

Fig. 6

The illustrative diagram of our model. We have a soft attention mechanism over segmented language instructions, which is controlled by the Indicator Controller. The segmented landmark description and local directional instruction are matched with the visual image and the memory image respectively by two matching modules. The action module fetches features from the visual observation, the memory image, the features of the language segments, and the two matching scores to predict an action at each step. The agent then moves to the next node, updates the visual observation and the memory image, continues the movement, and so on until it reaches the destination

Back to article page