Rapid grammar development and parsing: Constraint dependency grammars with abstract role values

Christopher M White, Purdue University

Abstract

Constraint Dependency Grammar (CDG) is a constraint-based grammatical formalism that has proven effective in parsing many natural languages. Unfortunately, grammar writers of CDG face the difficult task of creating hundreds of interrelated constraints for new grammars. Therefore, we have developed a grammar induction technique for CDGs in order to make the task of grammar development easier for grammar writers. This new technique makes use of Abstract Role Values (ARVs), which are a vector based representation of CDG constraints. We show how ARVs represent constraints and how they are modeled and used by a CDG parser. We describe an interface for annotating training sentences with dependency and feature information and how these annotations are used to induce ARVs. We then provide an active learning algorithm that makes use of early training information in order to automatically annotate later training sentences. This active learning algorithm helps improve consistency and the speed of annotating sentences by progressively changing the job of the grammar writer from that of annotating new sentences to merely verifying and correcting automatically generated annotations. Experiments demonstrate that ARV constraints induced from training sentences result in grammars that are tighter and can be used to parse faster than a CDG using hand generated constraints.

Degree

Ph.D.

Advisors

Harper, Purdue University.

Subject Area

Computer science|Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS