At this week’s MCQLL meeting, Ben LeBrun will be presenting Incremental and Systematic Visually-Grounded Language Understanding using Modular Symbolic Representations.
All are welcome to attend.
- Ben LeBrun
- Incremental and Systematic Visually-Grounded Language Understanding using Modular Symbolic Representations
Humans relate language to the external world systematically and incrementally, inferring systematic mappings between visual and linguistic input on word-by-word basis (Tanenhaus et al. 1995, Eberhard et al. 1995). In contrast, existing models of visually-grounded language understanding have no notion of incrementality, and often fail to behave systematically (Conwell & Ullman 2022, Ruis et al. 2020). In this talk, I will show that incrementality and systematicity are made possible by a model in which visual and linguistic input are tied together via modular symbolic representations of linguistic meaning and 3D visual scenes. I will present models which can (a) generate 3D visual scenes from natural language more systematically than existing approaches and (b) incrementally ground natural language in 3D visual scenes as precisely as humans.