CSIRO Data61
IntentFuse is a lightweight middleware that grounds natural language queries in 3D scenes by connecting a compact language model with a pretrained LERF. It reformulates free-form queries into structured prompts, handling affordances and negations without extra training. Experiments show clear gains over LERF, enabling intuitive affordance grounding for robotics and AR/VR exploration.
IntentFuse Query Engine overview. The Query Evaluator extracts key roles from natural language, the Context Provider resolves ambiguities using scene priors, and the structured output is passed to the LERF engine for precise 3D grounding.
"Something to tell time"
"Something wooden, unlike a soft toy."
"Decorative pillow with tree silhouette in cream and brown."
"Desk lamp"
@article{ravendran2025intentfuse, title={IntentFuse: Language-Guided 3D Scene Understanding via Prompt Filtering and Fusion}, author={Ravendran, Ahalya and Perera, Madhawa and Xu, Feng and Petersson, Lars and Wang, Dadong and Li, Xun}, journal={International Conference on Digital Image Computing: Techniques and Applications}, year={2025} }