next up previous
Next: Which solution will occur? Up: Symmetric and asymmetric solutions Previous: Symmetric solution

Asymmetric solution

 

The asymmetric solution is observed to always have the form illustrated in Figure 4. It can be described using one parameter ( tex2html_wrap_inline586 ) for one of the agents (say Agent A) and three parameters ( tex2html_wrap_inline580 , tex2html_wrap_inline582 , and tex2html_wrap_inline584 ) for the other agent (say Agent B). The functional forms of the response curves are described by:

  equation158

  equation166

   figure176
Figure 4: Cross plot of asymmetric response function solutions at tex2html_wrap_inline570 .

(Not all of these parameters are independent: B's price tex2html_wrap_inline582 is just below A's price-war threshold tex2html_wrap_inline586 , i.e. tex2html_wrap_inline906 .) We can derive approximations to the values of tex2html_wrap_inline580 , tex2html_wrap_inline582 , tex2html_wrap_inline584 and tex2html_wrap_inline586 that are accurate for sufficiently small tex2html_wrap_inline576 . First, consider the determination of tex2html_wrap_inline586 , the lowest price that A is willing to undercut. At this value, A is just on the verge of preferring to opt out of a price war and set its price up to v. Therefore, temporarily disregarding the fact that tex2html_wrap_inline586 must be an integer, we seek tex2html_wrap_inline586 such that tex2html_wrap_inline930 . These two Q-values can be computed by following the price trajectories up to the point where they join the price-war trajectory:

  eqnarray183

Equating the right-hand sides of these equations and rounding up to the nearest integer, we obtain

  equation186

The parameter tex2html_wrap_inline580 is the value of tex2html_wrap_inline934 at which Agent B decides to set its price aggressively low. This is (approximately) the point at which tex2html_wrap_inline938 . Following the price trajectories up to the point where they join the standard price-war trajectory, we find:

  eqnarray191

from which we obtain

  equation194

Similarly, we can compute tex2html_wrap_inline584 by setting tex2html_wrap_inline942 :

  eqnarray199

from which we obtain

  equation202

   figure207
Figure 5: Asymmetric solution: theoretical and observed tex2html_wrap_inline580 , tex2html_wrap_inline582 , tex2html_wrap_inline584 , and tex2html_wrap_inline586 as a function of tex2html_wrap_inline576 .

Figure 5 plots the values of tex2html_wrap_inline580 , tex2html_wrap_inline582 , tex2html_wrap_inline584 , and tex2html_wrap_inline586 as a function of tex2html_wrap_inline576 for tex2html_wrap_inline730 . The solid circles represent measurements taken by running the Q-learning algorithm until the Bellman error is minimized, while the solid curves represent the theoretical approximations given by Eqs. 18, 20, and 22, which are valid provided that tex2html_wrap_inline966 , along with the relation tex2html_wrap_inline906 .

   figure221
Figure 6: Bellman error for simultaneous Q-learning by agents 1 and 2, with tex2html_wrap_inline590 . Each time unit represents a number of random updates equal to the total number of price pairs.

Interestingly, this solution just barely fails to be fully self-consistent. A clear symptom of inconsistency can be seen in Figure 6, which plots the Bellman error (the discrepancy between the lefthand and righthand sides of Eq. 3) as a function of training time for sellers A and B. The Bellman error, defined as the average RMS error weighted equally over all price pairs, comes extremely close to zero, but suddenly shoots up dramatically. The error soon decreases, again dropping nearly to zero but shooting up again, and so the cycle continues unceasingly. For example, at time 464, the policies have the canonical pseudo-solution form, and the Bellman error is just 0.0007 for A and 0.0012 for B. However, at time 465, the response curve for A suddenly shifts from tex2html_wrap_inline982 to tex2html_wrap_inline984 as one ridge in tex2html_wrap_inline704 at tex2html_wrap_inline988 just rises above a ridge at tex2html_wrap_inline990 . This is manifested as a long finger extending across the crossplot illustrated in Figure 7.

   figure231
Figure 7: Policy crossplot at time t=465 during the Q-learning run of Fig.  6.

The location of the finger suggests that the problem lies in a part of tex2html_wrap_inline568 that is only relevant during transients before the price-war cycle has begun -- somewhere around tex2html_wrap_inline996 . In fact, detailed analysis reveals that, in this region, tex2html_wrap_inline998 very slightly exceeds tex2html_wrap_inline1000 if all other Q values are taken as described in Eqs. 15 and 16. As the Q function gradually becomes more self-consistent and accurate, it finally reaches the point where, for some value of tex2html_wrap_inline634 in the critical range, A's best response shifts from v to tex2html_wrap_inline1010 . Analogous fingers may develop for other tex2html_wrap_inline634 in this range as well. B soon discovers that, simply by shifting its threshold from tex2html_wrap_inline584 to tex2html_wrap_inline1010 , it can undercut A. Interestingly, analysis and observation demonstrate that A cannot retaliate by extending its finger a little further to the left; instead it retreats back to playing v. After a while, B shifts its threshold back up to tex2html_wrap_inline584 , and the policy cycle is ready to begin anew. The dramatic and cyclical shift in the policies translates into large cyclical spikes in the Bellman error for both players. The irregularity in amplitude and frequency is due to the randomness of the Q algorithm, and the gradual lengthening of the period is due to the cooling of the tex2html_wrap_inline708 parameter.


next up previous
Next: Which solution will occur? Up: Symmetric and asymmetric solutions Previous: Symmetric solution

kephart
Tue Mar 21 00:33:02 EST 2000