A simple shopbot model was first proposed in [1] to study price dynamics that may occur when a large fraction of consumers have ready access to price information. It is postulated that increased price awareness on the part of buyers will induce sellers to be more responsive in their pricing, and that this will lead to increased reliance on automated price-setting agents, or ``pricebots''. It is therefore of interest to develop automated pricing algorithms that yield high profits for sellers, and to explore the price dynamics that ensue when various pricing algorithms [2] are employed.
Briefly, the model supposes that a commodity is offered
for sale by S sellers. Buyers
generate purchase orders at random times: they identify the
lowest-priced seller from among a subset of from 1 to S prices, and
they purchase from that seller if the price is less than the
buyer's valuation. The buyer population can therefore be described in
terms of the distribution of valuations, and the
distribution of search strategies, denoted
, where
represents the fraction of buyers that compare the prices of i
randomly selected sellers. At random
intervals, individual sellers will decide unilaterally to revise
their prices as they wish.
If the sellers are perfectly informed about the distributions of buyer
valuations and search strategies, and if they know their competitors'
current prices, they can employ a myopic best-response strategy. This
entails choosing the price that will maximize the seller's expected
profit, assuming no further changes to the price
vector. (Of course, this assumption is violated when other sellers
change their prices.)
If all sellers use this ``myoptimal''
strategy, the market experiences cyclical price wars. The price war
cycle begins with each seller charging
, the price that a
profit-maximizing monopolist would charge. Each seller then competes
for market share by lowering its price, and the prices fall linearly
until a critical threshold
is reached.
At this point, the prices
immediately jump back up to
, and the cycle begins anew. The
upward jump in price occurs when
ownership of a large market share fails to compensate for the very low
profit margin. Instead, it is better for an agent to give up on market
share and simply cater to the much smaller segment of buyers that only
check a single price.
Despite its obvious flaws, the myoptimal algorithm tends to perform
well in head-to-head competition against other simple pricing
algorithms, including one based on a one-shot game-theoretic
analysis.
However, there is
clearly room for improvement. If a seller could foresee that
undercutting would invite quick retaliation by its competitors, it
might be less keen to undercut. If all sellers were to reason in this
manner, one would expect higher average prices and
profits. Previously, we have verified that a Q-learning approach
can raise average profits in a two-seller market, not just for the
shopbot model, but also for two other models of agent-mediated markets
in which myoptimal pricing leads to price war
cycles [7].
To understand how this improvement comes about, it is helpful to first
consider myoptimal pricing. For simplicity, we assume that there are
just S=2 sellers (A and B), each with production cost c=0, and
that all consumers have the same valuation v=100.
Then, provided that
does not exceed v, B's
expected profit per sale systemwide is simply:
Figure 1: Cross plot of myopic response functions
and
. Thin zigzag curve represents price trajectory
starting from the initial condition (0.40,0.69).
The myoptimal pricing policy of seller B is obtained by choosing the
price
that maximizes
.
This can be represented as a response function
:
The myoptimal response function
is
depicted in Fig. 1. The analogous and symmetric
myoptimal reponse function
is rotated and superimposed.
Starting from any given initial price vector, one can successively
apply the response functions
and
. Graphically, this
is represented by making alternate
vertical and horizontal movements to the
and
curves.
As illustrated in Fig. 1,
this leads to the cyclical price war discussed above.
Here and throughout the remainder of the paper, we shall
assume that prices are quantized to integers.
How can sellers introduce foresight into their pricing decisions? One method, introduced in [7], optimizes cumulative future-discounted profit instead of immediate profit. One of the sellers, say B, regards any price that it may set plus A's anticipated response to it as a single timestep, and adds the expected profits from its move and A's countermove. Projecting further into the future, B computes its expected future-discounted profit as an infinite weighted sum of expected profits in future timesteps. More compactly, we define
where
is a discount parameter, ranging
from 0 to 1, and
now represents the policy that optimizes
:
It is well established that, if
represents any fixed
policy (myoptimal or otherwise), then a reasonable updating
procedure will yield a unique, stable future-discounted profit
landscape
and associated response function
.
However, if A similarly computes its
and
associated response
, then a unique fixed point is not guaranteed.
In previous work [7] we
found empirically that there are two basic solutions: a
symmetric solution in which
and
have the same functional form,
and an approximate
broken-symmetry solution in which
and
are quite
different from one another and slightly unstable. These were
discovered by a standard Q-learning updating procedure in
which, starting from initial functions
and
, a random
seller and a random price vector were chosen, the right-hand side
of Eq. 3 was evaluated for that seller and
price vector using Eq. 4, and
the Q-value for that seller and price vector was moved
toward this computed value by a fraction
that diminished
gradually with time.
Regardless of the details of the buyer valuation and search-strategy distributions, and regardless of the details of the initial conditions or updating procedure of the Q-learning procedure, we have found to our surprise that only these two solutions are ever observed, and that they always have exactly the same qualitative form. This has motivated a deeper analysis of their form and nature, which we now present.