🧱 LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM

Sungkyunkwan University

Abstract

LEGO-SLAM is the first framework to achieve real-time, open-vocabulary mapping within a 3DGS-based SLAM system. By distilling high-dimensional language embeddings into a compact, scene-adaptive 16-dimensional feature space, we drastically reduce memory usage and enable 15 FPS performance. Our approach includes a language-guided pruning strategy that significantly reduces Gaussian counts without quality loss, along with an efficient loop detection method that reuses mapping features for robust tracking in novel environments.

System Overview

LEGO-SLAM Overview

The system takes RGB-D input to concurrently perform tracking and mapping. The Tracking module estimates camera poses via geometric alignment (G-ICP) directly against the 3D Gaussian map. The Mapping module constructs the scene using 3D Gaussians enriched with compact 16-dimensional language features, distilled via our scene-adaptive encoder to ensure real-time performance. To maintain map efficiency, the system incorporates Language-Guided Pruning to remove semantically redundant primitives. Furthermore, it employs Language-Based Loop Detection to correct long-term drift by efficiently reusing the compact mapping features, minimizing computational overhead.

Quantitative Results

ATE vs FPS comparison
PSNR vs FPS comparison

Comparison of tracking accuracy (ATE) and rendering quality (PSNR) against processing speed (FPS) across different SLAM methods. LEGO-SLAM achieves competitive accuracy and quality.

BibTeX

@article{lee2025lego,
      title={LEGO-SLAM: Language-Embedded Gaussian Optimization SLAM},
      author={Lee, Sibaek and Ha, Seongbo and Kang, Kyeongsu and Choi, Joonyeol and Tak, Seungjun and Yu, Hyeonwoo},
      journal={arXiv preprint arXiv:2511.16144},
      year={2025}
    }