Real-World Outdoor Scene
Real-World Indoor Scene
Replica Scene
TUM Scene
ScanNet Scene
Abstract
LEGO-SLAM is the first framework to achieve real-time, open-vocabulary mapping within a 3DGS-based SLAM system. By distilling high-dimensional language embeddings into a compact, scene-adaptive 16-dimensional feature space, we drastically reduce memory usage and enable 15 FPS performance. Our approach includes a language-guided pruning strategy that significantly reduces Gaussian counts without quality loss, along with an efficient loop detection method that reuses mapping features for robust tracking in novel environments.
System Overview
The system takes RGB-D input to concurrently perform tracking and mapping. The Tracking module estimates camera poses via geometric alignment (G-ICP) directly against the 3D Gaussian map. The Mapping module constructs the scene using 3D Gaussians enriched with compact 16-dimensional language features, distilled via our scene-adaptive encoder to ensure real-time performance. To maintain map efficiency, the system incorporates Language-Guided Pruning to remove semantically redundant primitives. Furthermore, it employs Language-Based Loop Detection to correct long-term drift by efficiently reusing the compact mapping features, minimizing computational overhead.
Quantitative Results
Comparison of tracking accuracy (ATE) and rendering quality (PSNR) against processing speed (FPS) across different SLAM methods. LEGO-SLAM achieves competitive accuracy and quality.
BibTeX
@article{lee2025lego,
title={LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM},
author={Lee, Sibaek and Ha, Seongbo and Kang, Kyeongsu and Choi, Joonyeol and Tak, Seungjun and Yu, Hyeonwoo},
journal={arXiv preprint arXiv:2511.16144},
year={2025}
}