I work on formal semantics of natural language — in particular, how to design, annotate, and model semantics in a scalable, data-driven way while taking advantage of our understanding of linguistic structure. I have worked on approaches to crowdsourcing annotation for syntactic parsing, semantic role labeling, and predicate-argument structure.
I am heavily involved in the QA-SRL Project, where I build systems for annotating and modeling linguistic structure at large scale. I have released two corpora:
- The QAMR Corpus — over 5,000 sentences annotated with Question-Answer Meaning Representation.
- QA-SRL Bank 2.0 — over 64,000 sentences annotated with Question-Answer Semantic Role Labels.
I am also interested in investigating how we may train and evaluate models' understanding of linguistic structure without direct supervision, grounded representations of language meaning (i.e., for semantic parsing), and bringing grounded and broad-coverage semantics closer together. See my publications for a full list of my work.
I develop and maintain a small set open-source tools for my research. They're written in Scala and are available on Maven Central. Some of this code is currently being used by colleagues for follow-on projects. If you are interested in using my code or contributing, don't hesitate to get in touch or raise issues on Github.
- spacro — Library for complex crowdsourcing pipelines built from single-page webapps
- nlpdata — Basic utilities for working with language data
- radhoc — Higher-order React components for modular UI design on the web
- qasrl-crowdsourcing — The QA-SRL annotation pipeline and general QA-SRL utilities such as auto-complete and auto-suggest for questions
- qasrl-bank-scala — Client library for the QA-SRL Bank 2.0 dataset
The Theory of Correlation Formulas and their Application to Discourse Coherence
Undergraduate Honors Thesis, UT Austin, 2015
S2 PDF Bib