LAMP: Implicit Language Map for Robot Navigation

Lee, Sibaek; Yu, Hyeonwoo; Kim, Giseop; Choi, Sunwook

💡 LAMP: Implicit Language Map for Robot Navigation

Sibaek Lee^1,2,†, Hyeonwoo Yu¹, Giseop Kim³, Sunwook Choi^2,*

¹Sungkyunkwan University ²NAVER LABS ³DGIST

^†Work done during an internship at NAVER LABS
^*Corresponding author

Paper arXiv

LAMP demonstrates large-scale language-driven robot navigation through implicit neural fields, achieving memory-efficient mapping and precise goal reaching.

Abstract

Enabling robots to follow natural language commands in large environments is challenging, as current mapping methods consume excessive memory. We introduce LAMP, a new navigation framework that uses an implicit neural field to learn a continuous, language-driven map of its surroundings. By combining a coarse search on a sparse graph with fine-grained, gradient-based optimization in the learned field, LAMP can precisely guide a robot to its goal. Our experiments show that this approach is significantly more memory-efficient and accurate than existing methods, opening new possibilities for scalable, language-aware robots.

Method

LAMP introduces a novel approach to language-driven robot navigation by learning an implicit neural field that continuously encodes language features across large-scale environments. Our method consists of three key components that work together to enable memory-efficient and precise navigation.

System Overview

(a) Implicit Language Map Construction: The robot traverses the environment and collects pairs of camera poses x and corresponding images I. Our neural network F_Θ maps each pose x to a language embedding z = F_Θ(x). Since processing the full large-scale topological graph is computationally expensive, we sample the graph 𝒢 using our proposed score-based optimization for coarse planning.

(b) Coarse Path Planning: Given a user's natural language query such as "red oak tree", we encode a goal embedding and apply A* search on the sampled graph 𝒢 to obtain a coarse path to the node whose embedding best matches the goal embedding.

(c) Fine Path Generation: We then generate the precise pose using F_Θ to maximize cosine similarity, moving from the coarse pose to a fine pose that offers a clear view of the target object through gradient-based optimization.

Key Innovations

Implicit Language Field: Unlike existing methods that explicitly store language vectors at every location, LAMP encodes language features as a continuous neural field, dramatically reducing memory requirements while maintaining fine-grained representation capability.
Bayesian Uncertainty Modeling: We adopt a von Mises-Fisher distribution to model embedding uncertainty, improving robustness when predicting language features for unobserved poses and reducing the impact of noisy CLIP embeddings.
Graph Sampling Strategy: Our method employs a novel node selection approach that combines view coverage, uncertainty scores, and semantic sensitivity to retain only the most informative nodes, enabling efficient large-scale navigation.
Two-Stage Path Planning: LAMP combines coarse graph-based planning with fine-grained gradient-based optimization in the learned field, achieving both global navigation capability and precise goal reaching.

Limitations

While LAMP demonstrates promising results in large-scale language-driven navigation, there are several limitations that should be considered:

CLIP vector dependency: Our method's performance is inherently tied to the quality and accuracy of CLIP embeddings. The navigation success varies significantly depending on the semantic clarity and distinctiveness of the target query. Objects with ambiguous visual features or similar appearance to other items may lead to incorrect node selection and suboptimal navigation outcomes.
Data-intensive training requirements: Since we implicitly model the pose-to-CLIP vector relationship through a neural network, our approach requires extensive training data to learn this complex mapping effectively. The implicit representation demands a large number of pose-image pairs to achieve robust performance, which may be challenging to collect in some environments.

BibTeX

@article{lee2025lamp,
  title={{LAMP}: Implicit Language Map for Robot Navigation},
  author={Lee, Sibaek and Yu, Hyeonwoo and Kim, Giseop and Choi, Sunwook},
  journal={IEEE Robotics and Automation Letters},
  year={2025},
  publisher={IEEE}
}