SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4365441196> ?p ?o ?g. }

Showing items 1 to 67 of 67 with 100 items per page.

W4365441196 abstract "Vision-and-Language Navigation (VLN) is the task that requires an agent to navigate through the environment based on natural language instructions. At each step, the agent takes the next action by selecting from a set of navigable locations. In this paper, we aim to take one step further and explore whether the agent can benefit from generating the potential future view during navigation. Intuitively, humans will have an expectation of how the future environment will look like, based on the natural language instructions and surrounding views, which will aid correct navigation. Hence, to equip the agent with this ability to generate the semantics of future navigation views, we first propose three proxy tasks during the agent's in-domain pre-training: Masked Panorama Modeling (MPM), Masked Trajectory Modeling (MTM), and Action Prediction with Image Generation (APIG). These three objectives teach the model to predict missing views in a panorama (MPM), predict missing steps in the full trajectory (MTM), and generate the next view based on the full instruction and navigation history (APIG), respectively. We then fine-tune the agent on the VLN task with an auxiliary loss that minimizes the difference between the view semantics generated by the agent and the ground truth view semantics of the next step. Empirically, our VLN-SIG achieves the new state-of-the-art on both the Room-to-Room dataset and the CVDN dataset. We further show that our agent learns to fill in missing patches in future views qualitatively, which brings more interpretability over agents' predicted actions. Lastly, we demonstrate that learning to predict future view semantics also enables the agent to have better performance on longer paths." @default.
W4365441196 created "2023-04-15" @default.
W4365441196 creator A5001987532 @default.
W4365441196 creator A5087164997 @default.
W4365441196 date "2023-04-10" @default.
W4365441196 modified "2023-09-28" @default.
W4365441196 title "Improving Vision-and-Language Navigation by Generating Future-View Image Semantics" @default.
W4365441196 doi "https://doi.org/10.48550/arxiv.2304.04907" @default.
W4365441196 hasPublicationYear "2023" @default.
W4365441196 type Work @default.
W4365441196 citedByCount "0" @default.
W4365441196 crossrefType "posted-content" @default.
W4365441196 hasAuthorship W4365441196A5001987532 @default.
W4365441196 hasAuthorship W4365441196A5087164997 @default.
W4365441196 hasBestOaLocation W43654411961 @default.
W4365441196 hasConcept C107457646 @default.
W4365441196 hasConcept C121332964 @default.
W4365441196 hasConcept C1276947 @default.
W4365441196 hasConcept C13662910 @default.
W4365441196 hasConcept C154945302 @default.
W4365441196 hasConcept C162324750 @default.
W4365441196 hasConcept C177264268 @default.
W4365441196 hasConcept C184337299 @default.
W4365441196 hasConcept C187736073 @default.
W4365441196 hasConcept C195324797 @default.
W4365441196 hasConcept C199360897 @default.
W4365441196 hasConcept C204321447 @default.
W4365441196 hasConcept C2780451532 @default.
W4365441196 hasConcept C2780580889 @default.
W4365441196 hasConcept C2780791683 @default.
W4365441196 hasConcept C2781067378 @default.
W4365441196 hasConcept C41008148 @default.
W4365441196 hasConcept C62520636 @default.
W4365441196 hasConceptScore W4365441196C107457646 @default.
W4365441196 hasConceptScore W4365441196C121332964 @default.
W4365441196 hasConceptScore W4365441196C1276947 @default.
W4365441196 hasConceptScore W4365441196C13662910 @default.
W4365441196 hasConceptScore W4365441196C154945302 @default.
W4365441196 hasConceptScore W4365441196C162324750 @default.
W4365441196 hasConceptScore W4365441196C177264268 @default.
W4365441196 hasConceptScore W4365441196C184337299 @default.
W4365441196 hasConceptScore W4365441196C187736073 @default.
W4365441196 hasConceptScore W4365441196C195324797 @default.
W4365441196 hasConceptScore W4365441196C199360897 @default.
W4365441196 hasConceptScore W4365441196C204321447 @default.
W4365441196 hasConceptScore W4365441196C2780451532 @default.
W4365441196 hasConceptScore W4365441196C2780580889 @default.
W4365441196 hasConceptScore W4365441196C2780791683 @default.
W4365441196 hasConceptScore W4365441196C2781067378 @default.
W4365441196 hasConceptScore W4365441196C41008148 @default.
W4365441196 hasConceptScore W4365441196C62520636 @default.
W4365441196 hasLocation W43654411961 @default.
W4365441196 hasOpenAccess W4365441196 @default.
W4365441196 hasPrimaryLocation W43654411961 @default.
W4365441196 hasRelatedWork W1509467138 @default.
W4365441196 hasRelatedWork W1541271503 @default.
W4365441196 hasRelatedWork W159132833 @default.
W4365441196 hasRelatedWork W180507639 @default.
W4365441196 hasRelatedWork W2081647779 @default.
W4365441196 hasRelatedWork W2766598384 @default.
W4365441196 hasRelatedWork W3107474891 @default.
W4365441196 hasRelatedWork W3185852197 @default.
W4365441196 hasRelatedWork W325548290 @default.
W4365441196 hasRelatedWork W1872130062 @default.
W4365441196 isParatext "false" @default.
W4365441196 isRetracted "false" @default.
W4365441196 workType "article" @default.