We now describe an algorithm due to Foster and Vohra [12] which achieves no internal regret, and a simple implementation due to Hart and Mas-Colell [22]. Learning via the following no internal regret algorithms converges to correlated equilibrium [2, 12].
The regret felt by a player at time t is formulated as the
difference between the payoffs obtained by utilizing strategy the
player's strategy of choice, say i, and the payoffs that could have
been achieved had strategy
been played instead:
where
is the indicator
function,
which has value 1 if strategy i is employed at time t,
and has value 0 otherwise. Now the cumulative regret
is the summation of regrets
from i to j through time T:
Internal regret is defined as follows:
where
. Finally, the cumulative internal regret
felt for playing all other strategies but for not having played
strategy j throughout the course of a game is given by:
Given the above definitions, consider the case of a 2-strategy
informed game, with strategies X and Y. The no internal regret
learning algorithm updates the components of the weight vector, namely
and
, according to the following formulae,
which reflect cumulative feelings of regret:
If the regret for having played strategy j rather than strategy i is significant, then the algorithm updates weights such that the probability of playing strategy i is increased. In general, if strategy i is played at time t,
where
is a normalizing term that is chosen s.t.:
This generalized algorithm is due to Hart and Mas-Colell [22].
Like the no external regret algorithm of Freund and Schapire [14], the above no internal regret algorithm depends on complete payoff information at all times t, including information that pertains to strategies that were not employed at time t. The no internal regret learning algorithm has also been studied in naive settings, where complete payoff information is not available (see Foster and Vohra [11] and Greenwald [16]). It remains to simulate the naive variant of the no internal regret learning algorithm in our model of shopbots and pricebots.