Finally, we present simulation results for two no regret pricing strategies, namely no external regret (NER) and no internal regret (NIR). As described in [16], there are a number of learning algorithms that satisfy the no external regret optimality criterion (e.g., Foster and Vohra [11] and Freund and Schapire [14]); similarly, the no internal regret optimality criterion is satisfied by algorithms due to both Foster and Vohra [12] and Hart and Mas-Colell [22]. In this section, we discuss simulations of NER pricebots á la Freund and Schapire and NIR pricebots á la Foster and Vohra. Rather than consider 5 pricebots as above, we limit our attention to merely 2 NR pricebots, since the dynamics of 2 such pricebots converges more readily than does that of 5. As no regret algorithms are inherently non-deterministic as well as myopic, they are candidates for learning mixed strategy equilibria of stage games.
Figure 4: (a) and (b) Price dynamics for 2 NER pricebots with learning rate
, and
and
, respectively.
(c) CDFs generated from simulation tape of 2 NIR pricebots with
theoretical Nash equilibrium overlayed.
Although the no external regret algorithm of Freund and Schapire has
been observed to converge to Nash equilibrium in games of 2 actions
(e.g., the Santa Fe bar problem, see Greenwald, et al. [18]), NER
pricebots cycle exponentially from price
to price
in the
prescribed model, which entertains n > 2 possible
actions
(see
Fig. 4(a)). In fact, the outcome of play of NER pricebots
in the shopbot game is reminiscent of the outcome of both NER learning
and fictitious play in the Shapley game, a game of 3 strategies for
which there is no pure strategy Nash equilibrium (see Greenwald,
et al. [17]). Fig. 4(b) depicts
simulations of NER pricebots in which we introduce a responsiveness
parameter
that exponentially weights the observed
history, effectively limiting its length to the finite value
, thereby allowing NER pricebots to more readily respond to
environmental changes. Interestingly, the empirical frequency of play
during these finite cycles mimics the symmetric mixed strategy Nash
equilibrium of the stage game; this result will be further explored in
a future publication.
We now investigate the properties of no internal regret (NIR) learning
in our model of shopbots and pricebots. Learning algorithms that
satisfy the no internal regret optimality criteria are known to
converge to correlated equilibria [2, 12], a
superset of the set of Nash equilibria which allows for dependence
among players' strategies. Despite several negative theoretical
results on the rational learning of Nash equilibria (e.g., Foster and
Young [13], Nachbar [27], and Greenwald
et al. [18]), in practice, no internal regret learning -- a
form of boundedly rational learning -- seems to learn Nash
equilibria. In our present simulations we observe convergence to the
symmetric mixed strategy Nash equilibrium. Fig. 4(c)
depicts the cumulative distribution functions generated from
simulation tapes of the no internal regret learning algorithm designed
by Foster and Vohra [12]. In these simulations, we
consider 2 NIR pricebots, and we let buyer distributions range from
to
. These plots
closely match the theoretical Nash equilibria given 2 sellers, which
are overlayed in this figure.