At this week’s lab meeting, Maria Ryskina will give a talk titled ‘Learning Computational Models of Non-Standard Language.’

  • Tuesday, February 8, 15:00–16:00 (Montreal time, UTC-5).
  • Meetings are via Zoom. If you would like to attend the talk but have not yet signed up for the MCQLL meetings this semester, please send an email to mcqllmeetings@gmail.com.

Abstract

Non-standard linguistic items, such as novel words or creative spellings, are common in domains like social media and pose challenges for automatically processing text from these domains. To build models capable of processing such innovative items, we need to not only understand how humans reason about non-standard language, but also be able to operationalize this knowledge to create useful inductive biases. In this talk, I will present empirical studies of several phenomena under the umbrella of non-standard language, modeled at the levels of granularity ranging from individual users to entire dialects. First, I will show how idiosyncratic spelling preferences reveal information about the user, with an application to the bibliographic task of identifying typesetters of historical printed documents. Second, I will discuss the common patterns in user-specific orthographies and demonstrate that incorporating these patterns helps with unsupervised conversion of idiosyncratically romanized text into the native orthography of the language. In the final part of the talk, I will focus on word emergence in a dialect as a whole and present a diachronic corpus study modeling the language-internal and language-external factors that drive neology.