Real-World Outdoor Scene
Real-World Indoor Scene
Replica Office0 Scene
Replica Office1 Scene
TUM 2 Scene
TUM 3 Scene
ScanNet 0169 Scene
ScanNet 0000 Scene
Abstract
LEGO-SLAM is the first framework to achieve real-time, open-vocabulary mapping within a 3DGS-based SLAM system. By distilling high-dimensional language embeddings into a compact, scene-adaptive 16-dimensional feature space, we drastically reduce memory usage and enable 15 FPS performance. Our approach includes a language-guided pruning strategy that significantly reduces Gaussian counts without quality loss, along with an efficient loop detection method that reuses mapping features for robust tracking in novel environments.
3D Relevancy Maps
System Overview
The system takes RGB-D input to concurrently perform tracking and mapping. The Tracking module estimates camera poses via geometric alignment (G-ICP) directly against the 3D Gaussian map. The Mapping module constructs the scene using 3D Gaussians enriched with compact 16-dimensional language features, distilled via our scene-adaptive encoder to ensure real-time performance. To maintain map efficiency, the system incorporates Language-Guided Pruning to remove semantically redundant primitives. Furthermore, it employs Language-Based Loop Detection to correct long-term drift by efficiently reusing the compact mapping features, minimizing computational overhead.
BibTeX
@article{lee2025lego,
title={LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM},
author={Lee, Sibaek and Ha, Seongbo and Kang, Kyeongsu and Choi, Joonyeol and Tak, Seungjun and Yu, Hyeonwoo},
journal={arXiv preprint arXiv:2511.16144},
year={2025}
}