Skip to main content
IBM Research
  Activities
DAR Pages

Home

Agenda

Activities

Publications

Seminars

People
The Data Analytics Research Project
Technical Activities


Our research activities generally result in technical reports, conference papers, and articles in technical journals and magazines. For a more comprehensive perspective, please check the group's publication page. Our development activities generally result in software for middleware as well as end-user applications. Some of this software has been transferred into IBM's product and solution offerings, while some is still in research prototype mode. Please contact us if you are interested in any of these.


Software

Lightweight Rule Induction (LRI)

Advanced Targeted Marketing for Single Events (ATM-SE)

Model Case Generator (MCG)

Lightweight Document Matcher (LDM)

Underwriting Profitability Analysis solution (UPA)
Probabilistic Estimation class library (ProbE)
Rule Abstraction for Modeling & Prediction (RAMP)



Software


Lightweight Rule Induction (LRI)

Lightweight Rule Induction (LRI) finds patterns and classifies various data using a method, based on a design created by the user, that generates compact, true/false, logical decision rules. LRI's pattern recognition method minimizes misclassification errors. Each class in LRI contains an equal number of unweighted rules; new data can be classified by applying all rules and assigning the data to the class with the most satisfied rules. In addition, an overall design can be specified by setting limits on the size and number of rules. Experimental results on large, standard data sets demonstrate that LRI's predictive performance can rival the best reported results in the literature.
LRI is available for download on IBM Alphaworks.

Back to Top



Lightweight Document Matcher (LDM)

The lightweight document matcher is a software package that is a pure Java based document matching facility that can run on any platform and useful as a text search engine when the required footprint is low, and when the entire collection of documents resides on the same platform as the search engine. The lightweight document matcher uses a combination of a novel matching algorithm and a stripped down document indexing mechanism to achieve a powerful text search capability. Additionally, it permits the input query to be an unrestricted text string.

The solution is now available as a package for embedding in software systems, as well as for trial evaluations by interested parties

Back to Top




Model Case Generator (MCG)

This software kit can be used for mining a large collection of documents for automatic generation of cleansed and summarized exemplars from the source repository. This can be an useful tool for managing very large collections of call center reports and logs, which typically get recorded by different individuals at differen times, with siginifcant variation in the structure. Applying the model case generator allows one to quickly extract the most common (or freequently asked) types of topical documents from the source. This can be an effective approach to automatically generate FAQs, or assist a FAQ author.

The solution is now available for use in pilot engagements, as well as for trial evaluations by interested parties.

Back to Top




Advanced Targeted Marketing for Single Events (ATM-SE)

This solution was developed specifically for mining large catalog mailing history databases to pull out response forecasting models for use in marketing campaigns. The solution was developed in partnership with and using problem and requirements specifications from a major direct mail company. The solution is intended for automatically generating promotional response models. Using forecasting models to predict positive responders to marketing stimulus is a big deal for the Retail industry, since advertising expenses on marketing campaigns tend to be extremely high relative to net profit. Reductions and savings in marketing expense directly improve net profit significantly. We worked with an acknowledged leader in the analytics required for building such models. New IBM technology embedded in ATM-SE outperforms the firm's proprietary methods. Additionally, while the firm's methods can take up to 2 weeks of human and computer time for building a forecasting model, ATM-SE delivers its results in 48 or less hours of compute intensive time. ATM-SE beta is installed and running at this direct mail firm's marketing office, and in plan for use in their mailings.

The solution is now available for use in consulting engagements, as well as for pilot evaluations by interested parties.

Back to Top



Underwriting Profitability Analysis solution
(UPA)


The UPA (Underwriting Profitability Analysis) application embodies a new approach to mining Property & Casualty (P&C) insurance policy and claims data for the purpose of constructing predictive models for insurance risks. UPA utilizes the ProbE (Probabilistic Estimation) predictive modeling class library to discover risk characterization rules by analyzing large and noisy insurance data sets. Each rule defines a distinct risk group and its level of risk. To satisfy regulatory constraints, the risk groups are mutually exclusive and exhaustive. The rules generated by ProbE are statistically rigorous, interpretable, and credible from an actuarial standpoint. The ProbE library itself is scalable, extensible, and embeddable. Our approach to modeling insurance risks and the implementation of that approach have been validated in an actual engagement with a P&C firm. The benefit assessment of the results suggest that this methodology provides significant value to the P&C insurance risk management process.

The UPA solution is currently available in the marketplace as a component of IBM's Decision Edge for Insurance warehouse and data mining solution suite, as well as for use in customer consulting engagements.

Back to Top



Probabilistic Estimation class library
(ProbE)


The ProbE (Probabilistic Estimation) class library is a framework for data modeling geared to rule induction algorithms. It is embeddable, i.e., targeted for customized solution building, and can be packaged as a kernel with settings and results files. ProbE is also designed to be extensible, i.e., designed for seamless incorporation of diverse data models.

The ProbE class library is C++ based, with two clearly defined sets of APIs for extension and embedding. It is designed to exploit the IBM Intelligent Miner's data access API, and also designed with a view towards data-parallel implementations and system error-recovery support.

ProbE is available as a research prototype for select customer engagements.

Back to Top



Rule Abstraction for Modeling and Prediction
(RAMP)


The Rule Abstraction for Modeling and Prediction (RAMP) system is a research prototype system that packages a collection of innovative algorithms that can be used in classification and regression modeling.

Overview

Generating accurate and robust models is crucial to the successful use and deployment of classifiers on a large scale. Rule induction, i.e., generating decision rule models from data, is often a preferred approach to classification modeling and prediction, due to the enhanced explanatory capability and interpretability of decision rules.

The RAMP system for rules abstraction and modeling is evolving with accuracy and robustness as primary goals. The system provides the following key capabilities:

1.feature analysis and selection based upon contextual merits technique
2.optimal discretization of numerical features based upon dynamic programming
3.generation of minimal DNF (Disjunctive Normal Form) rules based upon the R-MINI algorithm
4.rule based regression
5.rule pruning, weighting, and editing
6.alternate rule application strategies
7.accuracy evaluation of the model on test data.
8.hierarchical capability for case management, which helps end-users carry out multiple experiments on a data set, and manage these experiments as a set of related cases.

RAMP has been utilized in several large-scale real-life applications and some benchmark tasks which demonstrate its robustness. A detailed description of this system is available in an IBM Research Division technical report-- RAMP: Rules Abstraction for Modeling and Prediction by C. Apte, S.J. Hong, J. Lepre, S. Prasad, and B. Rosen, IBM RC-20271.

Back to Top



Revised January 20, 2003
HomeOrderContact IBMLegal