The projects I am currently working on include:
Language digitization - creating a multilingual corpus of aligned and analyzed text, as a digital form of language documentation & description, and as a platform to study unsupervised learning of machine translation systems.
Dependency parsing - learning dependency parsers for languages with nonplanar dependency graphs.
I am also interested in the following topics:
- semisupervised learning and spectral methods
- information extraction, especially for biomed
- partial parsing and deterministic parsing
- grammatical inference
- conversational agents
- spoken language systems
- automated phonetic transcription