next up previous
Next: Symmetric and asymmetric solutions Up: Pseudo-convergent Q-Learning by Competitive Previous: Introduction

Background

A simple shopbot model was first proposed in [1] to study price dynamics that may occur when a large fraction of consumers have ready access to price information. It is postulated that increased price awareness on the part of buyers will induce sellers to be more responsive in their pricing, and that this will lead to increased reliance on automated price-setting agents, or ``pricebots''. It is therefore of interest to develop automated pricing algorithms that yield high profits for sellers, and to explore the price dynamics that ensue when various pricing algorithms [2] are employed.

Briefly, the model supposes that a commodity is offered for sale by S sellers. Buyers generate purchase orders at random times: they identify the lowest-priced seller from among a subset of from 1 to S prices, and they purchase from that seller if the price is less than the buyer's valuation. The buyer population can therefore be described in terms of the distribution of valuations, and the distribution of search strategies, denoted tex2html_wrap_inline612 , where tex2html_wrap_inline614 represents the fraction of buyers that compare the prices of i randomly selected sellers. At random intervals, individual sellers will decide unilaterally to revise their prices as they wish.

If the sellers are perfectly informed about the distributions of buyer valuations and search strategies, and if they know their competitors' current prices, they can employ a myopic best-response strategy. This entails choosing the price that will maximize the seller's expected profit, assuming no further changes to the price vector. (Of course, this assumption is violated when other sellers change their prices.) If all sellers use this ``myoptimal'' strategy, the market experiences cyclical price wars. The price war cycle begins with each seller charging tex2html_wrap_inline618 , the price that a profit-maximizing monopolist would charge. Each seller then competes for market share by lowering its price, and the prices fall linearly until a critical threshold tex2html_wrap_inline620 is reached. At this point, the prices immediately jump back up to tex2html_wrap_inline618 , and the cycle begins anew. The upward jump in price occurs when ownership of a large market share fails to compensate for the very low profit margin. Instead, it is better for an agent to give up on market share and simply cater to the much smaller segment of buyers that only check a single price.

Despite its obvious flaws, the myoptimal algorithm tends to perform well in head-to-head competition against other simple pricing algorithms, including one based on a one-shot game-theoretic analysis.gif However, there is clearly room for improvement. If a seller could foresee that undercutting would invite quick retaliation by its competitors, it might be less keen to undercut. If all sellers were to reason in this manner, one would expect higher average prices and profits. Previously, we have verified that a Q-learning approach can raise average profits in a two-seller market, not just for the shopbot model, but also for two other models of agent-mediated markets in which myoptimal pricing leads to price war cycles [7].

To understand how this improvement comes about, it is helpful to first consider myoptimal pricing. For simplicity, we assume that there are just S=2 sellers (A and B), each with production cost c=0, and that all consumers have the same valuation v=100. Then, provided that tex2html_wrap_inline634 does not exceed v, B's expected profit per sale systemwide is simply:

  equation27

   figure42
Figure 1: Cross plot of myopic response functions tex2html_wrap_inline566 and tex2html_wrap_inline568 . Thin zigzag curve represents price trajectory starting from the initial condition (0.40,0.69).

The myoptimal pricing policy of seller B is obtained by choosing the price tex2html_wrap_inline634 that maximizes tex2html_wrap_inline654 . This can be represented as a response function tex2html_wrap_inline566 :

  equation49

The myoptimal response function tex2html_wrap_inline566 is depicted in Fig. 1. The analogous and symmetric myoptimal reponse function tex2html_wrap_inline568 is rotated and superimposed. Starting from any given initial price vector, one can successively apply the response functions tex2html_wrap_inline662 and tex2html_wrap_inline664 . Graphically, this is represented by making alternate vertical and horizontal movements to the tex2html_wrap_inline662 and tex2html_wrap_inline664 curves. As illustrated in Fig. 1, this leads to the cyclical price war discussed above. Here and throughout the remainder of the paper, we shall assume that prices are quantized to integers.

How can sellers introduce foresight into their pricing decisions? One method, introduced in [7], optimizes cumulative future-discounted profit instead of immediate profit. One of the sellers, say B, regards any price that it may set plus A's anticipated response to it as a single timestep, and adds the expected profits from its move and A's countermove. Projecting further into the future, B computes its expected future-discounted profit as an infinite weighted sum of expected profits in future timesteps. More compactly, we define

  eqnarray56

where tex2html_wrap_inline576 is a discount parameter, ranging from 0 to 1, and tex2html_wrap_inline662 now represents the policy that optimizes tex2html_wrap_inline682 :

  equation59

It is well established that, if tex2html_wrap_inline664 represents any fixed policy (myoptimal or otherwise), then a reasonable updating procedure will yield a unique, stable future-discounted profit landscape tex2html_wrap_inline686 and associated response function tex2html_wrap_inline566 . However, if A similarly computes its tex2html_wrap_inline692 and associated response tex2html_wrap_inline568 , then a unique fixed point is not guaranteed.

In previous work [7] we found empirically that there are two basic solutions: a symmetric solution in which tex2html_wrap_inline568 and tex2html_wrap_inline566 have the same functional form, and an approximate broken-symmetry solution in which tex2html_wrap_inline664 and tex2html_wrap_inline662 are quite different from one another and slightly unstable. These were discovered by a standard Q-learning updating procedure in which, starting from initial functions tex2html_wrap_inline704 and tex2html_wrap_inline682 , a random seller and a random price vector were chosen, the right-hand side of Eq. 3 was evaluated for that seller and price vector using Eq. 4, and the Q-value for that seller and price vector was moved toward this computed value by a fraction tex2html_wrap_inline708 that diminished gradually with time.

Regardless of the details of the buyer valuation and search-strategy distributions, and regardless of the details of the initial conditions or updating procedure of the Q-learning procedure, we have found to our surprise that only these two solutions are ever observed, and that they always have exactly the same qualitative form. This has motivated a deeper analysis of their form and nature, which we now present.


next up previous
Next: Symmetric and asymmetric solutions Up: Pseudo-convergent Q-Learning by Competitive Previous: Introduction

kephart
Tue Mar 21 00:33:02 EST 2000