SEGUE

 

 

 

 

Home > Research > Multimedia Output > SEGUE

Motivation

Natural language generation (NLG) is essential in dialogue systems. To communicate effectively, the NLG system must generate sentences that are grammatical, coherent, fluent, and concise. Many decisions must be made before semantic concepts are transformed into a string of words. Our goal is to build a NLG system that is easy to use and requires relatively little customization efforts for new domains.


Approach

SEGUE is a hybrid system that combines both case-based reasoning (CBR) and rule-based approaches for NLG. There are several approaches to build a surface generator. Template-based systems are easy to develop, but the text they produce lacks variety. Rule-based systems, on the other hand, can produce more varied text, but developers with linguistic sophistication are required to develop and maintain the generators. Recently, statistics-based generation systems have achieved some success. The main deterrent to the use of statistics-based generators is that they require a large training corpus. Without enough training instances, their performance degrades. SEGUE, in contract, incorporated the advantages from both statistical and rule-based approahces. It uses a relatively small annotated corpus but performs rule-based adaptation to ensure the adapted sentences are grammatically correct. As a result, it can achieve higher accuracy than statistical NLG systems while requiring much smaller corpus collection effort. More significantly, because CBR has learning capability, case-based generators can perform tasks more efficiently as they accumulate solutions.

Recently, we developed a novel instance-based sentence boundary determination method for natural language generation that optimizes a set of criteria based on examples in a corpus. Compared to existing sentence boundary determination approaches, our work offers significant contributions. First, our approach provides a general domain independent framework that effectively addresses sentence boundary determination by balancing a comprehensive set of sentence complexity and quality related constraints. Second, our approach can simulate the characteristics and the style of naturally occurring sentences in an application domain since our solutions are optimized based on their similarities to examples in a corpus. In our blind tests, users strongly preferred sentences generated by using our approaches over a widely used approach for making sentence boundary decisions in NLG.


Publications