|
|
MotivationNatural language generation (NLG) is essential in dialogue systems. To communicate effectively, the NLG system must generate sentences that are grammatical, coherent, fluent, and concise. Many decisions must be made before semantic concepts are transformed into a string of words. Our goal is to build a NLG system that is easy to use and requires relatively little customization efforts for new domains. ApproachSEGUE
is a hybrid system that
combines both case-based reasoning (CBR) and rule-based approaches for
NLG.
There are several approaches to build a surface
generator. Template-based systems are easy to develop, but the text
they produce lacks variety. Rule-based systems, on the other hand,
can produce more varied text, but developers with linguistic
sophistication are required to develop and maintain the generators.
Recently, statistics-based generation systems have achieved some
success. The main deterrent to the use of statistics-based generators
is that they require a large training corpus. Without enough training
instances, their performance degrades. SEGUE, in contract,
incorporated the advantages from both statistical and rule-based
approahces. It uses a relatively small annotated corpus but performs
rule-based adaptation to ensure the adapted sentences are
grammatically correct. As a result, it can achieve higher accuracy than
statistical NLG systems while requiring much smaller corpus collection
effort. More significantly, because CBR has learning
capability, case-based generators can perform tasks more efficiently
as they accumulate solutions.
Recently, we developed a novel instance-based sentence boundary
determination method for natural language generation that optimizes a
set of criteria based on examples in a corpus. Compared to existing
sentence boundary determination approaches, our work offers
significant contributions. First, our approach provides a general
domain independent framework that effectively addresses sentence
boundary determination by balancing a comprehensive set of sentence
complexity and quality related constraints. Second, our approach can
simulate the characteristics and the style of naturally occurring
sentences in an application domain since our solutions are optimized
based on their similarities to examples in a corpus. In our blind
tests, users strongly preferred sentences generated by using our
approaches over a widely used approach for making sentence
boundary decisions in NLG.
|
|
||||||||||||||||||