next up previous
Next: Evolution of Shopbots and Up: Simulations Previous: DF Pricebots

NR Pricebots

 

Finally, we present simulation results for two no regret pricing strategies, namely no external regret (NER) and no internal regret (NIR). As described in [16], there are a number of learning algorithms that satisfy the no external regret optimality criterion (e.g., Foster and Vohra [11] and Freund and Schapire [14]); similarly, the no internal regret optimality criterion is satisfied by algorithms due to both Foster and Vohra [12] and Hart and Mas-Colell [22]. In this section, we discuss simulations of NER pricebots á la Freund and Schapire and NIR pricebots á la Foster and Vohra. Rather than consider 5 pricebots as above, we limit our attention to merely 2 NR pricebots, since the dynamics of 2 such pricebots converges more readily than does that of 5. As no regret algorithms are inherently non-deterministic as well as myopic, they are candidates for learning mixed strategy equilibria of stage games.

   figure362
Figure 4: (a) and (b) Price dynamics for 2 NER pricebots with learning rate tex2html_wrap_inline1061 , and tex2html_wrap_inline1063 and tex2html_wrap_inline1065 , respectively. (c) CDFs generated from simulation tape of 2 NIR pricebots with theoretical Nash equilibrium overlayed.

Although the no external regret algorithm of Freund and Schapire has been observed to converge to Nash equilibrium in games of 2 actions (e.g., the Santa Fe bar problem, see Greenwald, et al. [18]), NER pricebots cycle exponentially from price tex2html_wrap_inline1429 to price tex2html_wrap_inline1483 in the prescribed model, which entertains n > 2 possible actionsgif (see Fig. 4(a)). In fact, the outcome of play of NER pricebots in the shopbot game is reminiscent of the outcome of both NER learning and fictitious play in the Shapley game, a game of 3 strategies for which there is no pure strategy Nash equilibrium (see Greenwald, et al. [17]). Fig. 4(b) depicts simulations of NER pricebots in which we introduce a responsiveness parameter tex2html_wrap_inline1631 that exponentially weights the observed history, effectively limiting its length to the finite value tex2html_wrap_inline1633 , thereby allowing NER pricebots to more readily respond to environmental changes. Interestingly, the empirical frequency of play during these finite cycles mimics the symmetric mixed strategy Nash equilibrium of the stage game; this result will be further explored in a future publication.

We now investigate the properties of no internal regret (NIR) learning in our model of shopbots and pricebots. Learning algorithms that satisfy the no internal regret optimality criteria are known to converge to correlated equilibria [2, 12], a superset of the set of Nash equilibria which allows for dependence among players' strategies. Despite several negative theoretical results on the rational learning of Nash equilibria (e.g., Foster and Young [13], Nachbar [27], and Greenwald et al. [18]), in practice, no internal regret learning -- a form of boundedly rational learning -- seems to learn Nash equilibria. In our present simulations we observe convergence to the symmetric mixed strategy Nash equilibrium. Fig. 4(c) depicts the cumulative distribution functions generated from simulation tapes of the no internal regret learning algorithm designed by Foster and Vohra [12]. In these simulations, we consider 2 NIR pricebots, and we let buyer distributions range from tex2html_wrap_inline1635 to tex2html_wrap_inline1637 . These plots closely match the theoretical Nash equilibria given 2 sellers, which are overlayed in this figure.


next up previous
Next: Evolution of Shopbots and Up: Simulations Previous: DF Pricebots

kephart
Thu Nov 18 11:55:53 EST 1999