This appendix describes the no regret learning algorithms which are
simulated in Sec. 5.4. There are two no regret criteria
of interest, namely no external regret and no internal regret.
Computational learning theorists consider the difference between the
expected payoffs that are achieved by the strategies prescribed by a
given algorithm, as compared to the payoffs that could be achieved by
any other fixed sequence of decisions, in the worst-case. If the
difference between these two sums is negligible, then the algorithm
exhibits no external regret. Early no external regret
algorithms appeared in Blackwell [4],
Hannan [21], and Banos [3]. Game theorists Foster
and Vohra [12] study an alternative measure of worst-case
performance. If the difference between the cumulative payoffs that
are achieved by a sequence of strategies generated by a given
algorithm in comparison with the cumulative payoffs that could be
achieved by a remapped sequence of strategies is insignificant, then
the algorithm is said to exhibit no internal
regret.
No internal regret implies no external
regret.
The no regret algorithms are presented from the point of view of an
individual player, as if that player were playing a game against
nature, where nature is taken to be a conglomeration of all its
opponents. From this perspective, let
denote the payoffs
obtained by the player of interest at time t via strategy i.
Mixed strategy weights at time t are given by the probability vector
, for
, where S is the number of strategies.