next up previous
Next: No External Regret Learning Up: Shopbots and Pricebots Previous: Pure Strategy Nash Equilibria

No Regret Learning

 

This appendix describes the no regret learning algorithms which are simulated in Sec. 5.4. There are two no regret criteria of interest, namely no external regret and no internal regret. Computational learning theorists consider the difference between the expected payoffs that are achieved by the strategies prescribed by a given algorithm, as compared to the payoffs that could be achieved by any other fixed sequence of decisions, in the worst-case. If the difference between these two sums is negligible, then the algorithm exhibits no external regret. Early no external regret algorithms appeared in Blackwell [4], Hannan [21], and Banos [3]. Game theorists Foster and Vohra [12] study an alternative measure of worst-case performance. If the difference between the cumulative payoffs that are achieved by a sequence of strategies generated by a given algorithm in comparison with the cumulative payoffs that could be achieved by a remapped sequence of strategies is insignificant, then the algorithm is said to exhibit no internal regret.gif No internal regret implies no external regret.

The no regret algorithms are presented from the point of view of an individual player, as if that player were playing a game against nature, where nature is taken to be a conglomeration of all its opponents. From this perspective, let tex2html_wrap_inline2038 denote the payoffs obtained by the player of interest at time t via strategy i. Mixed strategy weights at time t are given by the probability vector tex2html_wrap_inline2046 , for tex2html_wrap_inline2048 , where S is the number of strategies.





kephart
Thu Nov 18 11:55:53 EST 1999