At this week’s lab meeting, Jacob will present an ongoing project on Compositionality and Predictability.

  • Wednesday, May 6th, at 14:30 UTC-4
  • Via Zoom. Contact Emily for details.


Compositionality is perhaps the most fundamental property of linguistic structure. There is an (often tacit) assumtion in NLP that the underlying structure may be expected to correlate with the structure which optimizes predictability. The strong version of this correlation hypothesis is that compositional structure is in fact entirely reducible to coocurrence statistics. With mutual information between heads and dependants as a measure of predictability, this strong correlation hypothesis is proposed explicitly in Futrell et al., 2019.

In the past few years, contextualized word embedding models have taken the field of NLP by storm, revolutionizing performance on a number of downstream tasks which seem to implicate syntactic knowledge. These models, trained on prediction-based language modelling objectives, provide natural tools to explore the question of how much the patterns of predictability and coocurrence correlate with compositional structure. In this work we make use of contextual embedding models to estimate a measure of pointwise mutual information between words given context, and show that the hypothesis that compositional structure can be recovered directly by optimizing for predictability does not hold. We examine the ways in which the differences between compositional syntactic structures and structures optimized for predictability may be broken down in order to begin to explain why these two different kinds of structure overlap when they do.


Jacob Louis Hoover is a second year PhD student in linguistics at McGill University / Mila. His research interests involve syntax, semantics, mathematical linguistics, and information theory.