next up previous
Next: About this document Up: No Regret Learning Previous: No External Regret Learning

No Internal Regret Learning

We now describe an algorithm due to Foster and Vohra [12] which achieves no internal regret, and a simple implementation due to Hart and Mas-Colell [22]. Learning via the following no internal regret algorithms converges to correlated equilibrium [2, 12].

The regret felt by a player at time t is formulated as the difference between the payoffs obtained by utilizing strategy the player's strategy of choice, say i, and the payoffs that could have been achieved had strategy tex2html_wrap_inline2074 been played instead:

  equation583

where tex2html_wrap_inline2076 is the indicator function, which has value 1 if strategy i is employed at time t, and has value 0 otherwise. Now the cumulative regret tex2html_wrap_inline2086 is the summation of regrets from i to j through time T:

  equation590

Internal regret is defined as follows:

  equation596

where tex2html_wrap_inline2094 . Finally, the cumulative internal regret felt for playing all other strategies but for not having played strategy j throughout the course of a game is given by:

equation601

Given the above definitions, consider the case of a 2-strategy informed game, with strategies X and Y. The no internal regret learning algorithm updates the components of the weight vector, namely tex2html_wrap_inline2104 and tex2html_wrap_inline2106 , according to the following formulae, which reflect cumulative feelings of regret:

eqnarray608

If the regret for having played strategy j rather than strategy i is significant, then the algorithm updates weights such that the probability of playing strategy i is increased. In general, if strategy i is played at time t,

  eqnarray619

where tex2html_wrap_inline2118 is a normalizing term that is chosen s.t.:

  eqnarray633

This generalized algorithm is due to Hart and Mas-Colell [22].

Like the no external regret algorithm of Freund and Schapire [14], the above no internal regret algorithm depends on complete payoff information at all times t, including information that pertains to strategies that were not employed at time t. The no internal regret learning algorithm has also been studied in naive settings, where complete payoff information is not available (see Foster and Vohra [11] and Greenwald [16]). It remains to simulate the naive variant of the no internal regret learning algorithm in our model of shopbots and pricebots.


next up previous
Next: About this document Up: No Regret Learning Previous: No External Regret Learning

kephart
Thu Nov 18 11:55:53 EST 1999