next up previous
Next: No Internal Regret Learning Up: No Regret Learning Previous: No Regret Learning

No External Regret Learning

Freund and Schapire [14] derive an algorithm that achieves no external regret via multiplicative updating. Their algorithm is dependent on the cumulative payoffs achieved by all strategies, including the surmised payoffs of strategies which are not played. In particular, let tex2html_wrap_inline2052 denote the cumulative payoffs obtained through time t via strategy i, which is computed as follows: tex2html_wrap_inline2058 . Now the weight assigned to strategy i at time t+1, for tex2html_wrap_inline2064 , is given by:

  equation570

The multiplicative updating rule given in Equation 17 can be modified to become applicable in naive settings, where complete payoff information is not available, but rather the only payoff information known at time t is that which pertains to the strategy which was in fact employed at time t. Such a variant of this multiplicative updating algorithm appears in Auer, Cesa-Bianchi, Freund, and Schapire [1]. It remains to perform simulations of this naive algorithm in our model of shopbots and pricebots.



kephart
Thu Nov 18 11:55:53 EST 1999