In formal linguistics, combinatory categorial grammar (CCG) is an efficiently parsable, yet linguistically expressive, grammar formalism. It has a transparent interface between surface syntax and underlying semantic representation, including predicateâÂÂargument structure, quantification, and information structure. The formalism generates constituency-based structures (as opposed to dependency-based ones) and is therefore a type of phrase structure grammar (as opposed to a dependency grammar).
CCG relies on combinatory logic, which has the same expressive power as the lambda calculus, but builds its expressions differently. The first linguistic and psycholinguistic arguments for basing a grammar on combinators were put forth by Steedman and Szabolcsi.
More recent prominent proponents of CCG include Pauline Jacobson and Jason Baldridge, who have continued development therein. In these new approaches, the combinator B (the "compositor") is found to be useful in creating long-distance dependenciesâÂÂas in, e.g., "Who do you think Mary is talking about?"âÂÂand the combinator W (the "duplicator") is useful for the lexical interpretation of reflexive pronouns, as in "Mary talks about herself". Together with I (the identity mapping) and C (the "permutator"), these form a set of primitive, non-interdefinable combinators. Jacobson interprets personal pronouns as the combinator I; their binding is aided by a complex combinator Z, as in "Mary lost her way". Z is definable using W and B.
The CCG formalism defines a number of combinators (the most common being application, composition, and type-raising). These operate on syntactically-typed lexical items, by means of natural deduction-style proofs. The goal of the proof is to find some way of applying the combinators to a sequence of lexical items, until no lexical item is unused in the proof; after the proof is complete, the resulting type is the type of the whole expression. Thus, proving that some sequence of words is a sentence of some language amounts to proving that the words reduce to the type S.
The syntactic type of a lexical item can be either a primitive typeâÂÂsuch as S, N, or NPâÂÂor complex, such as , or .
The complex types, schematizable as and , denote functor types that take an argument of type Y and return an object of type X. A forward slash denotes that the argument should appear to the right, while a backslash denotes that the argument should appear on the left. Any type can stand in for the X and Y here, making syntactic types in CCG a recursive type system.
The application combinatorsâÂÂoften denoted by > for forward application, and < for backward applicationâÂÂapply a lexical item with a functor type to an argument with an appropriate type. The definition of application may be given as:
The composition combinatorsâÂÂoften denoted by for forward composition, and for backward compositionâÂÂare similar to function composition from mathematics, and can be defined as follows:
The type-raising combinatorsâÂÂoften denoted by for forward type-raising and for backward type-raisingâÂÂconvert an argument type (usually a primitive type) to a functor type, which takes as its argument a functor that takes the original (i.e., prior to raising) argument type:
The sentence "the dog bit John" has a number of different possible proofs. Below are a few of them. The variety of proofs demonstrates the fact that in CCG, sentences don't have a single structure, as in other models of grammar.
Let the types of these lexical items be
We can perform the simplest proof (changing notation slightly for brevity) as:
Opting to type-raise and compose some, we could get a fully incremental, left-to-right proof. The ability to construct such a proof is an argument for the psycholinguistic plausibility of CCG, because listeners do in fact construct partial interpretations (syntactic and semantic) of utterances before they have been completed.
In terms of the ChomskyâÂÂSchützenberger hierarchy, CCGs can generate context-free languages, and some but not all context-sensitive languages.
An example of a non-context-free language that CCGs can generate is the language (which is an indexed language). A grammar for this language can be found in Vijay-Shanker and Weir (1994).
Vijay-Shanker and Weir (1994) demonstrates that linear indexed grammars, combinatory categorial grammars, tree-adjoining grammars, and head grammar are weakly equivalent formalisms, in that they all define the same string languages. Kuhlmann et al. (2015) show that this equivalence, and the ability of CCG to describe , rely crucially on the ability to restrict the use of the combinatory rules to certain categories, in ways not explained above.