Our models make a number of simplifying assumptions relative to
the likely complexities of real agent economies.
The economy is restricted to two sellers,
competing on the basis of price, who
offer similar or identical products to a large population of
consumer agents. Prices are discretized and lie between
a minimum and maximum price; there are
typically
possible prices.
This renders the state space
small enough to use lookup tables to represent
the agents' pricing policies and expected profits. Time
is also discretized; at each time step,
the consumers compare the current prices of the sellers, and
instantaneously and deterministically choose to buy from
at most one seller. Hence at each time step, for each possible
pair of prices, there is a deterministic profit
obtained by each seller.
We also assume that the sellers alternately take turns adjusting their prices, rather than simultaneously setting prices. Alternating-turn dynamics is motivated by two considerations: (a) It ensures that there will be a deterministic optimal policy (Littman, 1994), and hence normal Q-learning, which yields deterministic policies, can apply. (b) In a realistic many-seller economy, it seems reasonable to assume that the sellers will adjust their prices at different times rather than at the same time (although probably not in a well-defined order).
The three economic models studied here are
described in detail elsewhere
.
In the first model, called the
``Price-Quality'' model (Sairamesh and Kephart, 1998),
the sellers' products have
different values of a scalar ``quality'' parameter, with
higher-quality products being perceived as more valuable.
At each time step, the consumers buy
the lowest-priced product subject to constraints of
a maximum allowable price and a minimum allowable quality.
The substitutability of seller products enables
direct price competition, and the ``vertical''
differentiation of differing quality values leads to
asymmetries in the sellers' profit functions. Such asymmetries
can result in unending cyclic price wars
when the sellers employ myoptimal pricing strategies.
The second model, described in (Kephart, Hanson and Sairamesh, 1998), is an ``Information-Filtering'' model in which the two sellers offer news articles in partly overlapping categories. This model contains a ``horizontal'' differentiation of article categories. To the extent that the categories overlap, there can be direct price competition, and to the extent that they differ, there are asymmetries that again lead to the potential for cyclic price wars.
The third model is the ``Shopbot'' model described in (Greenwald and Kephart, 1999), which models the situation on the Internet in which some consumers use a shopbot to compare prices of all sellers offering a given product, and select the lowest-priced seller. In this model, the sellers' products are identical, and their profit functions are symmetric. Myoptimal pricing leads the sellers to undercut each other until the minimum price point is reached. At that point, a new price war cycle can be launched, due to asymmetric buyer behavior, rather than seller asymmetries. Some buyers choose a random seller rather than bargain-hunt with the shopbot; this makes it profitable to abandon the low-price competition, and instead maximally exploit the random buyers by charging the maximum possible price.
Figure 1: Sample profit landscape for seller 1 in
Price-Quality model, as a function of seller 1 price
and seller 2 price
.
An example seller profit function, taken from the Price-Quality
model, is plotted in figure 1. This shows the
instantaneous profit for seller 1,
.
The quality parameters are
,
(i.e.
seller 1 is the higher-quality seller).
The myoptimal policy for seller 1,
, is obtained for each value of
by sweeping across all values of
and choosing the value
with the highest profit. For small
,
the peak profit is obtained at
, whereas for
larger
, there is eventually a discontinuous shift
to the other peak, which follows the parabolic-shaped ridge
along the diagonal.
The Information-Filtering and Shopbot models also have similar profit landscapes. In all three models, it is the existence of multiple, disconnected peaks in the landscapes, with varying relative heights depending on the other seller's price, that leads to price wars when the sellers behave myopically.
In these models it is assumed for simplicity that the players have essentially perfect information. They can model the consumer behavior perfectly, and they also have perfect knowledge of each other's costs and profit functions. Hence the model is in essence a two-player perfect-information deterministic game, similar to games like chess. The main differences are that the payoffs are not strictly zero-sum, there are no terminating nodes in the state space, and payoffs are given to the players at every time step.