This week’s lab meeting will feature a talk from Emily Goodwin.

  • Thursday, March 25, 13:30–14:30 (Montréal time, UTC-4).
  • Meetings are via Zoom. If you are interested in attending any of the meetings this semester, please take a moment now to register at this link. After approval, you will receive a confirmation email with the link to join.

Abstract

Recent attention in neural natural language understanding models has focused on generalization that is compositional (the meanings of larger expressions are a function of the meanings of smaller expressions) and systematic (individual words mean the same thing when put in novel combinations). Datasets for compositional and systematic generalization often focus on testing classes of syntactic constructions (testing only on strings of a certain length or longer, or novel combinations of particular predicates). In contrast, the compositional freebase queries (CFQ) training and test sets are automatically sampled. To measure the compositional challenge of a test set relative to its training set, they measure the divergence between the distribution of syntactic compounds in test and train. Training and test splits with maximum compound divergence (MCD) are highly challenging for semantic parsers, but (unlike other datasets designed to test compositional generalization) the splits do not specifically hold-out human-recognizable classes of syntactic constructions from the training set. In this talk I will present preliminary results of a syntactic analyses of the MCD splits released in the CFQ dataset, and explore whether model failures on MCD splits can be explained in terms of phenomena familiar to syntactic theory.

Bio

Emily is a second-year masters student in the McGill linguistics department.