
Apte, C., Hong, S. J., Hosking, J. R. M., Lepre, J.,
Pednault, E. P. D., and Rosen, B. K. (1998).
Decomposition of heterogeneous classification problems.
Intelligent Data Analysis, 2, 81-96.
An earlier version was published in
Advances in intelligent data analysis: reasoning about data,
Proceedings of the Second International Symposium, IDA-97, London,
August 1997, Lecture Notes in Computer Science, vol. 1280, ed. X. Liu,
P. Cohen and M. Berthold, pp. 17-28, Springer-Verlag, Berlin, 1997.
Abstract.
In some classification problems the feature space is heterogeneous
in that the best features on which to base the classification
are different in different parts of the feature space.
In some other problems the classes can be divided into subsets
such that distinguishing one subset of classes from another
and classifying examples within the subsets
require very different decision rules, involving different sets of features.
In such heterogeneous problems, many modeling techniques
(including decision trees, rules, and neural networks)
evaluate the performance of alternative decision rules
by averaging over the entire problem space, and are prone to generating a
model that is suboptimal in any of the regions or subproblems.
Better overall models can be obtained by splitting the problem
appropriately and modeling each subproblem separately.
This paper presents a new measure to determine the degree of dissimilarity
between the decision surfaces of two given problems, and suggests a way to
search for a strategic splitting of the feature space that identifies
regions with different characteristics.
We illustrate the concept using a multiplexor problem,
and apply the method to a DNA classification problem.
[ J. R. M. Hosking's home page |
IBM Research home page ][
IBM home page |
Order |
Search |
Contact IBM |
Help |
(C) |
(TM)
]