next up previous
Next: Asymmetric solution Up: Symmetric and asymmetric solutions Previous: Symmetric and asymmetric solutions

Symmetric solution

 

   figure75
Figure 2: Cross plot of symmetric response function solutions for Q-learning with tex2html_wrap_inline570 .

For all values of tex2html_wrap_inline576 in the range tex2html_wrap_inline714 , the symmetric best-response policy R(p) is observed to have a functional form that depends on just two parameters tex2html_wrap_inline572 and tex2html_wrap_inline574 :

  equation82

Figure 2 illustrates the symmetric R(p) for the case tex2html_wrap_inline730 and tex2html_wrap_inline570 . This differs from the myoptimal policy of Fig. 1 in that undercutting does not continue all the way down to tex2html_wrap_inline620 . Instead, when the price gets down to tex2html_wrap_inline572 , the agent aggressively drops its price all the way down to a value tex2html_wrap_inline574 . The opponent's best response to tex2html_wrap_inline574 is to set the price back up to v. While this aggressive price lowering decreases the agent's immediate profit, it proves advantageous in the long run for at least two reasons. First, both agents avoid the lower portion of the price-war cycle, and so the average price over the course of a cycle is increased. Second, when the competitor responds with price v, the agent can then undercut at the price v-1, yielding a relatively high profit.

To calculate the parameters tex2html_wrap_inline572 and tex2html_wrap_inline574 , consider first that tex2html_wrap_inline574 is the price below which no agent will price because the advantage of increased market share is outweighed by the very small profit margin. The price tex2html_wrap_inline574 can be determined by noting that the discounted reward must be just marginally higher than that of choosing price v. Taking price quantization into account, tex2html_wrap_inline574 must be the smallest integer such that tex2html_wrap_inline760 .

For low to moderate values of tex2html_wrap_inline576 , very accurate approximations to both Q values can be computed. First, consider tex2html_wrap_inline766 . As the undercutter, B's expected profit according to Eq. 1 is tex2html_wrap_inline770 . Next, Agent A will respond with tex2html_wrap_inline774 . Since B is still the undercutter, its profit will be another tex2html_wrap_inline770 . B will then respond by undercuting A with price v-1. Therefore, using Eq. 3,

  equation96

Now consider tex2html_wrap_inline786 . Since B is undercut by A, its expected reward is tex2html_wrap_inline792 . A then responds with v-1; this too undercuts B, and thus B again receives tex2html_wrap_inline792 . At its next opportunity, B undercuts Agent A with price v-2. Thus the Q-value in this case is

  equation99

To compute tex2html_wrap_inline810 and tex2html_wrap_inline812 , note that these price vectors are at or near the beginning of the price war cycle. Until the price drops down to tex2html_wrap_inline814 , B will alternately be the undercutter and the undercuttee. Thus, when B sets its price to p, the expected profits from its move and A's countermove will simply be tex2html_wrap_inline824 . At B's next turn, it will set its price to p-2, and so on. Therefore,

  eqnarray102

where the approximation comes about because the finite arithmetico-geometric series is being approximated by an infinite one; this is valid to the extent that tex2html_wrap_inline830 -- i.e. it assumes that B places negligible weight on events after the end of the price-war cycle.

Noting that tex2html_wrap_inline574 must be the integer just greater than the value obtained by equating Eqs. 6 and 7, and using Eq. 8, we obtain

  equation115

where the approximation is quite accurate provided that the following condition is satisfied:

  equation121

Similarly, tex2html_wrap_inline572 must be the smallest price for which tex2html_wrap_inline838 , i.e. the point at which the future discounted reward of slightly undercutting tex2html_wrap_inline572 is just barely higher than that of agressive undercutting to tex2html_wrap_inline574 . In the first scenario, the price sequence leading into the beginning of the price-war cycle is tex2html_wrap_inline844 , i.e. B's immediate profit will be higher but it will then be undercut twice in a row by A. In the second scenario, the price sequence leading into the price-war cycle will be tex2html_wrap_inline850 , i.e. B's immediate profit will be lower but it will undercut A twice in a row. The Q functions for these two scenarios can be computed:

  equation125

and

  equation128

and equated to yield

  equation131

which is accurate provided that Eq. 10 holds. A simpler but less accurate expression can be derived by substituting Eq. 9, ignoring the integer restrictions, and neglecting terms of O(1):

  equation138

Figure 3 plots the values of tex2html_wrap_inline572 and tex2html_wrap_inline574 as a function of tex2html_wrap_inline576 for tex2html_wrap_inline730 . The solid circles represent measurements taken by running the Q algorithm to convergence, while the curves represent the theoretical approximations obtained from Eqs. 13 and 9 up to the point where Eq. 10 becomes seriously violated.

   figure145
Figure 3: Symmetric solution: theoretical and observed tex2html_wrap_inline572 and tex2html_wrap_inline574 as a function of tex2html_wrap_inline576 . Wiggles in theoretical curve are due to integer ceiling functions in Eq. 9 and 13.


next up previous
Next: Asymmetric solution Up: Symmetric and asymmetric solutions Previous: Symmetric and asymmetric solutions

kephart
Tue Mar 21 00:33:02 EST 2000