One of the most important ideas of modern linguistics is that it is possible to understand the way people understand, learn, and use language using mathematically precise, computational models. In recent years, such models have found their way into nearly every aspect of our lives, powering much of our interaction with our phones, tablets, and computers. Computational linguistics refers to the set of methods and tools used to develop such models.

These methods are generally deployed in one of two ways. First, they can be used scientifically as an aid to understand the consequences of the complex assumptions that are often made in theories of language learning or use. By formalizing hypotheses, we can deduce outcomes that would be difficult or impossible to understand without simulation and/or proof and we can often derive new kinds of predictions from our theories. Second, such implemented computational models are increasingly important as engineering tools in applications across the spectrum of technology.

At McGill, research in computational linguistics is strong in both scientific applications here in the linguistics department and engineering applications in the university more broadly, with strong links between the two. Here in the department we use a variety of computational tools and methods to study fundamental questions about language acquisition, processing, use in society, and change over time.

A strong focus of the research in the department is on structured probabilistic models of phonology, morphology, and syntax which make critical use of linguistic theories in these domains. We are also interested in how computational models of linguistic structure can interface with computational models of other areas of cognition, such as vision, intuitive physics, folk psychology, and audition. To build such models we often draw on the framework of probabilistic programming languages, but also make use a wide variety of other areas of machine learning and artificial intelligence. Our research also includes other areas of computational linguistics, such as formal language and automata theory and dynamical systems models of language variation and change.

Our department has a special interest in formal tools and techniques for understanding the acoustic and articulatory realization of phonetics and phonology. In order to carry out “big data” studies of phonetics and phonology, we also build and apply tools for querying and analyzing large speech datasets. These tools use contemporary databases, machine learning, and speech recognition technology, in line with our more general interest in adapting powerful methods from computer science to analyze linguistic data. Some projects integrate and query large speech datasets (Speech Corpus Tools), align text and speech (Montréal Forced Aligner), and automatically measure phonetic variables (AutoVOT).

On the engineering side of computational linguistics, Jackie Cheung (Computer Science) uses statistical methods from artificial intelligence and machine learning to generate text and speech that is fluent and appropriate to context. Several of his group’s current projects include summarizing fiction, extracting events from text, and adapting language across genres. Derek Ruths (Computer Science) analyzes large-scale human behavior using data from Twitter and other online forums. McGill also has a strong Digital Humanities community, led by faculty Andrew Piper and Stéfan Sinclair (Languages, Literatures, and Cultures), who analyze a variety of types of textual data using computational methods to address questions about literature, culture, and society.