Classification and regression trees are tree structures that predict the value of a target attribute of interest. Classification trees are also called as decision trees.

Given a database with a target attribute and conditional attributes, we would like to construct a classification tree/regression tree that accurately predicts the target attribute values for unknown data.

As learnt knowledge should be small, decision trees should be small. However, constructing the minimum tree is NP-hard. Therefore we usually use entropy-base heuristics.


For regression trees, we use mean squared error for the objective function.

Conventional methods use "Guillotine-cut splitting" as a test for numeric conditional attributes. Guillotine-cut splitting tends to generate large tree when there are correlated attributes.

In order to overcome this problem, we introduce "region splitting". The region splitting divides data into two subsets, inside and outside a 2-dimensional region.

We can find the best regions for this purpose, namely, the optimized entropy regions for decision trees and the optimized mean squared error regions for regression trees. By using region splitting, we can obtain very compact and accurate trees.