🧱 LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM

Sungkyunkwan University

Abstract

LEGO-SLAM is the first framework to achieve real-time, open-vocabulary mapping within a 3DGS-based SLAM system. By distilling high-dimensional language embeddings into a compact, scene-adaptive 16-dimensional feature space, we drastically reduce memory usage and enable 15 FPS performance. Our approach includes a language-guided pruning strategy that significantly reduces Gaussian counts without quality loss, along with an efficient loop detection method that reuses mapping features for robust tracking in novel environments.

3D Relevancy Maps

Our method bicycle query result
LangSplat bicycle query result
Our method chair query result
LangSplat chair query result
Our method grass query result
LangSplat grass query result
Our method wall query result
LangSplat wall query result

System Overview

LEGO-SLAM Overview

The system takes RGB-D input to concurrently perform tracking and mapping. The Tracking module estimates camera poses via geometric alignment (G-ICP) directly against the 3D Gaussian map. The Mapping module constructs the scene using 3D Gaussians enriched with compact 16-dimensional language features, distilled via our scene-adaptive encoder to ensure real-time performance. To maintain map efficiency, the system incorporates Language-Guided Pruning to remove semantically redundant primitives. Furthermore, it employs Language-Based Loop Detection to correct long-term drift by efficiently reusing the compact mapping features, minimizing computational overhead.

BibTeX

@article{lee2025lego,
      title={LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM},
      author={Lee, Sibaek and Ha, Seongbo and Kang, Kyeongsu and Choi, Joonyeol and Tak, Seungjun and Yu, Hyeonwoo},
      journal={arXiv preprint arXiv:2511.16144},
      year={2025}
    }