E-Mail Intelligent Routing and Responding

Pnina Vortman
Haifa Research Lab
vortman@il.ibm.com

Introduction

With the growing population of internet users and customers with e-mail access, the number of e-mail messages directed to support services grew tremendously. Support centers are flooded with large number of e-mail and require to increase the number of agents that handle e-mail. Moreover, there is a desire to ensure that all e-mail are responded within 48 hours of receipt, there is a need for central logging and monitoring, there is a need to be able to reroute and track e-mail to the skilled agent and handle the e-mail with all the sophistication which is today available for phones.
In addition e-mail is text based, so text mining technology can be applied to perform additional intelligent tasks. Mail handling should get a very high focus within Call Centers today.
 

From Forrester



Telecom Strategies

____________________________________________________________________
analyzes telecommunications in the era of deregulation and the Internet

A New Call Center Tool -- Automatic E-Mail Distributor
David Goodtree
David M. Cooperstein

Volume One, Number Seventeen

June 30, 1997
 

Aetna, Nike, and others are soliciting customer inquiries on their Web sites by asking users to click a button and fill out an e-mail form. But few firms have worked out how to handle the incoming deluge destined for sales and service reps (see the February, 1997 Telecom Strategies Report, "Call Centers Meet The Web"). Forrester believes that the solution will arrive in a new product, the "automatic e-mail distributor" (AED). Expanding on and integrated with the call center's automatic call distributor, the AED is defined by Forrester as a system for shepherding inbound customer e-mail from receipt through final handling by a group of agents. The AED will:

1) Look up addresses and categorize messages. Like today's screen pop application, the AED will search for a match of the sender's e-mail address in the corporation's customer database. Additionally, by scanning the header and body of the message for keywords like "credit card" or "mortgage," the AED will classify the subject of the e-mail for the next step of processing.

2) Auto-reply. The AED will offer businesses the option to send an instant, canned e-mail reply upon message receipt, promising further action within a specified time. If the incoming message was preformatted by a Web page (like a catalog request), the AED will pass the inquiry to other systems for automatic processing.

3) Route the mail and report on the queue. The AED will route the inquiry to the most appropriate rep for handling based on the customer and subject information already discovered. Supervisors will also be able to monitor and trend the number of messages handled, speed of answer, and agent productivity from a console integrated with the voice management system.
 

BUILD OR BUY AN AED?

Some components of an AED already exist from vendors like Brightware and Silknet, but no one has assembled the total package yet.  If call center managers need the help today, MIS is forced to cobble together its own system. But in 1998, Forrester sees additional first-generation offers rolling out from:

Computer-telephony ISVs. Firms like Genesys and Nabnasset understand the tie between customer asset management systems and phone calls. Making the leap to e-mail will not be very hard.

E-mail ISVs. Software.com, Lotus, and Oracle already sell corporate e-mail servers and could team up with the ISVs above to build a potent system based on their routing engines.

Switch makers. Lucent, Nortel, Aspect, and Rockwell will become the most credible of the lot by extending their telephony-based queue management expertise to e-mail.
 

PROFICIENT HANDLING OF E-MAIL WILL BEGET OUTBOUND CAMPAIGNS

Once call centers can comfortably handle large flows of customer inbound e-mail, businesses will feel confident that they can kick off outbound campaigns. As a result:

Marketers will weave e-mail into their channel strategy. In addition to setting up "click here to give feedback" buttons, marketers will target customer segments with: 1) teaser messages to pull Web site visitations; 2) clearance promotions to sell overstocks while avoiding print catalog expense; and 3) appreciation letters to thank clients for their patronage, with embedded coupons or notification of a credit line increase.

E-mail addresses will be worth real money. Godiva chocolatier will pay Bon Appètit magazine a nickel or more for subscriber names with e-mail addresses. In turn, the confectioner will blast out pre-Valentine's Day missives pitching its romantic truffle assortment to well-qualified target customers.

Junk e-mail will rise from nuisance to fact of life. Just like postal mail, unsolicited messages will arrive en masse. Occasionally, recipients will be interested; most of the time, the message will end up in the trash. Some consumers will seek out the help of organizations like the Direct Marketing Association to put their names on the "do not e-mail" list.


Objectives

Provide Call Centers a more "automated" procedure for handling customer e-mail inquiries, ensure service standards are met that all e-mails are responded to in a timely manner, provide quality control standards on all outbound correspondence; maintain customer contact history log of all e-mail correspondence, provide machine learning methods for categorization and automatic responding.

IBM Research has today the best technology for E-Mail categorization which is based on the uptodate available research both in categorization and search technology. clustering and document similarity techniques. The groups in Haifa Research Lab and in T.J. Watson are working on e-mail handling technology using Information Retrieval techniques for sometime already. The "war on the mailers" is on. Currently, IBM computers are still handling a large percentage of the e-mail traffic. To maintain IBM position in this area, enhancements to mailers such as automatic categorization, responding and filing is essential.

IBM Call Center solutions, such as CallPath, EarlyCloud and DirectTalk, should be enhanced with features to handle inbound e-mail and generators and responders to handle outbound e-mail.

Turn corresponding into Market Data

With this E-Mail solution, companies can track what it is asked by whom and how it responded, turning these e-mail queries and answers into valuable market data and giving you the opportunity to mine for qualified sales opportunities. Extending this solution with text mining tools on the data bases, it can provide a base for generating reports giving insight into customer's hottest issues. These reports can include what customers inquired about and why they visited your site. It can combine quality control reports so it is possible to monitor the questions and resulting responses and actions. With this data, it is possible to create good FAQ shits, and revise WEB pages so companies can take further action to ensure customer satisfaction and effective marketing.
 

Architecture - How should it work ?


E-mail Automation

Base Components

document similarity engine for English based stemming and word distribution analysis methods developed at the IBM Haifa Research Lab

text categorization (SuperCat) based on automatic rule induction method developed at the IBM T.J. Watson Research Center

Lotus agent a Lotus script application running on the Lotus notes server and serves as an intelligent agent which reacts to every incoming mail and process the new mail for automatic categorization and responding.

Lotus script client application (with a Call Center based application CallFlow or CallPath) invoked on the agent workstation when new e-mail gets to the top of the agent's worklist.

categorization development software which is an off-line preparation phase executed as a setup process to create the SuperCat rules.

Administration tools needed to edit and manipulate the mapping table between categories and skills or agents or queues.

logging, tracking, monitoring tools are needed to log every incoming mail and track its time in the queue, the method used for reply, the duration it took to handle the mail until completion, statistics about agents performance and reports capabilities for post mortum.

call center software products which can handle queues, workflow and worklists and can distribute messages between agents (such as CallFlow or CallPath).

Presentation explaining the main principles of the E-Mail Responder is attached below:
E-mail Responder

Similarity Engine Details

 

Basic Technologies

1. document similarities methods (information retrieval)

Statistical text analysis techniques are used to extract from textual data the most significant words and word combinations.

This task is conducted in several steps:

1. 1 tokenization and lemmatization

Words and sentences are first isolated from documents (via automaton-based mechanism), then every individual word is mapped into a canonical form. The individual word is the atomic unit of conceptual information in most indexing schemes. Therefore, it is highly desirable that several variants of a same word (plural vs sinvs.lar, various verb declinations, construct forms) be mapped into a same canonical form, the "lemma", as they are conceptually identical. This lemmatization stage is a crucial and can be solved according to two approaches:
via "stemming" (based on ad-hoc suffix stripping rules and exception lists), or morphological analysis (based on a actual grammatical rules and a dictionary). The approach we propose to use will make use of context-based disambiguation methods that proved valuable in other indexing applications (see [Maarek et. al 1989])

Assuming that a unique lemma has been obtained, the next step is to obtain a profile of indexing units that will characterize each document.

1.2 indexing

The indexing stage consists of inferring from the list of lemmas that compose a text the most significant ones so as to form a "profile" or "document vector" in Salton's vector space model [Salton & McGill 83], in which each different indexing unit induces a distinct dimension. Most indexing unit are based on single words (in their lemmatized form). The document similarity engine is taking advantage of an original indexing unit developed at IBM Research, based upon the notion of "lexical affinity" and embodied in several IBM products, in which indexing units consist of word pairs in close context that disambiguate each other. Lexical-affinity based schemes have been shown to give higher precision results in retrieval effectiveness [Maarek 91].

2 text categorization (machine learning)

The SuperCat text categorization solution is based on supervised machine learning methods that induce symbolic rules, and includes:

(A) Categorization Development software and

(B) Runtime Categorization system.

The Categorization Development software consists of three components:

1. Feature Selection and Vector Generation,

2. Machine Learning Method, and

3. Rules Generation and Simplification.

The Categorization Development phase (or training phase) is a batch process. The amount of time required for the machine learning method depends on the amount of data and the number of categories. However, it is reasonably fast on a reasonably powerful machine, i.e., although we do not have precise benchmark timings, our experience indicates that training per category should take very roughly no more than 15 minutes per category, including feature selection, vector generation and rules induction.

The runtime system consists of:

1. Linguistic Preprocessor,

2. Rule Applier. Note that one of the competitive edges of SuperCat is that rules can be hand-altered or hand-built as well as automatically generated. Thus the rule applier can mix and match machine-generated and hand-built rules.

3. Performance Tracker which provides information on precision and recall overall and on a per category basis, reports which rules applied correctly or incorrectly on test data, as well as information on processing speed (documents per second, rules per second, bytes per second). In categorizing a new document, the system reports back as "supporting data" which rules applied to the document. One could also customize a system so that the words or phrases in a document that we involved in rule applying would be highlighted so that portion of a text could be examined. (not currently implemented).

Advantages of Rule-Induction Systems:

The use of rules has a number of advantages:

1. Rule application is fast and does not depend on the size of the database of documents (unlike e.g., Nearest Neighbor algorithms),

2. Training is straightforward

3. Rules are understandable,

4. Rules can be hand-altered or hand-crafted.

5. Rule induction approaches support multiple categorization and hierarchical categorization.

SuperCat Runtime System Requirements:

The SuperCat runtime system runs on Windows95 & WindowsNT; AIX, OS/2 and requires only modest CPU and RAM,

e.g., 100 MHz Pentium with 16-32 MB of RAM. The runtime classification system is delivered as a DLL with easy-to- use API.