Research


Hong, S. J., Hosking, J. R. M., and Winograd, S. (1996). Use of randomization to normalize feature merits. In Information, Statistics and Induction in Science, Proceedings of the ISIS 96 conference, eds. D. L. Dowe, K. B. Korb and J. J. Oliver, pp. 10-19. World Scientific, Singapore.

Abstract. Feature merits are used for feature selection in classification and regression as well as for decision tree generation. Commonly used merit functions exhibit a bias towards features that take a large variety of values. We present a scheme based on randomization for neutralizing this bias by normalizing the merits. The merit of a feature is normalized by division by the expected merit of a feature that is random noise taking the same distribution of values as the given feature. The noise feature is obtained by randomly permuting the values of the given feature. The scheme can be used for any merit function including the Gini and entropy measures. We demonstrate its effectiveness by applying it to the contextual merit defined by S. J. Hong ["Use of contextual information for feature ranking and discretization", IBM Research Report RC19664, 1994].


[ J. R. M. Hosking's home page | IBM Research home page ]

[ IBM home page | Order | Search | Contact IBM | Help | (C) | (TM) ]