At this week’s lab meeting, Jacob will be presenting on the connection between grammatical structure and the statistics of word occurrences in language use.

  • Wednesday, October 21, at 13:30 (Montréal time, UTC-4).
  • Meetings are via Zoom. If you would like to attend and have not already signed up for the MCQLL mailing list, please fill out this google form.

Abstract

There is an intuitive connection between grammatical structure and the statistics of word occurrences observed in language use. This intuitive connection is reflected in cognitive models and also in NLP, in the assumption that the patterns of predictability correlate with linguistic structure. We will call this general idea the *dependency-dependence* hypothesis. This hypothesis is implicit in the use of language modelling objectives for training modern neural models, and has been made explicitly in some approaches to unsupervised dependency parsing. The strongest version of this hypothesis is to say that compositional structure is in fact entirely reducible to cooccurrence statistics (a hypothesis made explicit in Futrell et al. 2019). In this talk I will describe a study using the mutual information of pairs of words using pretrained contextualized embedding models to show that the optimal structure for prediction is not very closely correlated to the compositional structure. In this work we propose that contextualized mutual information scores of this kind may be useful as a way to understand the structure of predictability, as a system distinct from compositional structure, but also integral to language use.

Bio

Jacob is a PhD student at McGill Linguistics / Mila. He is broadly interested in logic, mathematical linguistics, and the generative / expressive capacity of formal systems, as well as information theory, and examining what both human and machine learning might be able to tell us about the underlying structure of language.