Classification and regression trees are tree structures that predict
the value of a target attribute of interest. Classification trees are
also called as decision trees. Given a database with a target attribute and conditional attributes, we would like to construct a classification tree/regression tree that accurately predicts the target attribute values for unknown data.
As learnt knowledge should be small, decision trees should be small.
However, constructing the minimum tree is NP-hard. Therefore we
usually use entropy-base heuristics.
For regression trees, we use mean squared error for the objective
function.
Conventional methods use "Guillotine-cut splitting" as a test for
numeric conditional attributes. Guillotine-cut splitting tends to
generate large tree when there are correlated attributes.
In order to overcome this problem, we introduce "region splitting".
The region splitting divides data into two subsets, inside and outside
a 2-dimensional region.
We can find the best regions for this purpose, namely, the optimized
entropy regions for decision trees and the optimized mean squared
error regions for regression trees. By using region splitting, we can
obtain very compact and accurate trees.