next up previous
Next: Single-agent Q-learning Up: Pricing in agent economies Previous: Introduction

Model agent economies

Real agent economies are likely to contain large numbers of agents, with complex details of how the agents behave and interact with each other on multiple time scales. Our approach toward modeling and understanding such complexity is to begin by making a number of simplifying assumptions. We first consider the simplest possible case of two competing seller agents offering similar or identical products to a large population of consumer agents. The sellers compete on the basis of price, and we assume that prices are discretized and can lie between a minimum and maximum price, such that the number of possible prices is at most a few hundred. This renders the state space small enough that it is feasible to use lookup tables to represent the agents' pricing policies and expected profits. Time in the simulation is also discretized; at each time step, we assume that the consumers compare the current prices of the two sellers, and instantaneously and deterministically choose to purchase from at most one seller. Hence at each time step, for each possible pair of seller prices, there is a deterministic reward or profit given to each seller. The simulation can iterate forever, and there may or may not be a discounting factor for the present value of future rewards.

It is worth noting that the consumers are not regarded as ``players'' in the model. The consumers have no strategic role: they behave according to an extremely simple, fixed, short-term greedy rule (buy the lowest priced product at each time step), and are regarded as merely providing a stationary environment in which the two sellers can compete in a two-player game. This is clearly a simplifying first step in the study of multi-agent phenomena, and in future work, the models will be extended to include strategic and adaptive behavior on the part of the consumers as well. This will change the notion of ``desirable'' system behavior. In the present model, desirable behavior would resemble ``collusion'' between the two sellers in charging very high prices, so that both could obtain high profits. Obviously this is not desirable from the consumers' viewpoint.

Regarding the dynamics of seller price adjustments, we assume that the sellers alternately take turns adjusting their prices, rather than simultaneously setting prices (i.e. the game is extensive-form rather than normal-form). Our choice of alternating-turn dynamics is motivated by two considerations: (a) As the number of sellers becomes large and the model becomes more realistic, it seems more reasonable to assume that the sellers will adjust their prices at different times rather than at the same time, although they probably will not take turns in a well-defined order. (b) With alternating-turn dynamics, we can stay within the normal Q-learning framework where the Q-function implies a deterministic optimal policy: it is known that in two-player alternating turn games, there always exists a deterministic policy that is as good as any non-deterministic policy (Littman, 1994). In contrast, in games with simultaneous moves (for example, rock-paper-scissors), it is possible that no deterministic policy is optimal, and that the existing Q-learning formalism for MDPs would have to be modified and extended so that it could yield non-deterministic optimal policies.

We study Q-learning in three different economic models that have been described in detail elsewhere (Sairamesh and Kephart, 1998; Kephart, Hanson and Sairamesh, 1998; Greenwald and Kephart, 1999). The first model, called the ``Price-Quality'' model (Sairamesh and Kephart, 1998), models the sellers' products as being distinguished by different values of a scalar ``quality'' parameter, with higher-quality products being perceived as more valuable by the consumers. The consumers are modeled as trying to obtain the lowest-priced product at each time step, subject to threshold-type constraints on both quality and price, i.e., each consumer has a maximum allowable price and a minimum allowable quality. The similarity and substitutability of seller products leads to a potential for direct price competition; however, the ``vertical'' differentiation due to differening quality values leads to an asymmetry in the sellers' profit functions. It is believed that this asymmetry is responsible for the unending cyclic price wars that emerge when the sellers employ myoptimal pricing strategies.

The second model is an ``Information-Filtering'' model described in detail in (Kephart, Hanson and Sairamesh, 1998). In this model there are two competing sellers of news articles in somewhat overlapping categories. In contrast to the vertical differentiation of the Price-Quality model, this model contains a horizontal differentiation in the differing article categories. To the extent that the categories overlap, there can be direct price competition, and to the extent that they differ, there are asymmetries introduced that again lead to the potential for cyclic price wars.

The third model is the so-called ``Shopbot'' model described in (Greenwald and Kephart, 1999), which is intended to model the situation on the Internet in which some consumers may use a Shopbot to compare prices of all sellers offering a given product, and select the seller with the lowest price. In this model, the sellers' products are exactly identical and the profit functions are symmetric. Myoptimal pricing leads the sellers to undercut each other until the minimum price point is reached. At that point, a new price war cycle can be launched, due to buyer asymmetries rather than seller asymmetries. The fact that not all buyers use the Shopbot, and some buyers instead choose a seller at random, means that it can be profitable for a seller to abandon the low-price competition for the bargain hunters, and instead maximally exploit the random buyers by charging the maximum possible price.

An example profit function that we study, taken from the Price-Quality model, is as follows: Let tex2html_wrap_inline320 and tex2html_wrap_inline322 represent the prices charged by seller 1 and seller 2 respectively. Let tex2html_wrap_inline356 and tex2html_wrap_inline358 represent their respective quality parameters, with tex2html_wrap_inline360 . Let c(q) represent the cost to a seller of producing an item of quality q. Then assuming the particular model of consumer behavior described in (Sairamesh and Kephart, 1998) one can show analytically that in the limit of infinitely many consumers, the instantaneous profits per consumer tex2html_wrap_inline366 and tex2html_wrap_inline368 obtained by seller 1 and seller 2 respectively are given by:

  equation11

  equation19

A plot of the profit landscape for seller 1 as a function of prices tex2html_wrap_inline320 and tex2html_wrap_inline322 is given in figure 1, for the following parameter settings: tex2html_wrap_inline384 , tex2html_wrap_inline386 , and c(q) = 0.1 (1+q). (These specific parameter settings were chosen because they are known to generate harmful price wars when the agents use myopic optimal pricing.) We can see in this figure that the myopic optimal price for seller 1 as a function of seller 2's price, tex2html_wrap_inline390 , is obtained for each value of tex2html_wrap_inline322 by sweeping across all values of tex2html_wrap_inline320 and choosing the value that gives the highest profit. We can see that for small values of tex2html_wrap_inline322 , the peak profit is obtained at tex2html_wrap_inline398 , whereas for larger values of tex2html_wrap_inline322 , there is eventually a discontinuous shift to the other peak, which follows along the parabolic-shaped ridge in the landscape. An analytic expression for the myopic optimal price for seller 1 as a function of tex2html_wrap_inline322 is as follows (defining tex2html_wrap_inline404 and tex2html_wrap_inline406 ):

  equation28

Similarly, the myopic optimal price for seller 2 as a function of the price set by seller 1, tex2html_wrap_inline414 , is given by the following formula (assuming that prices are discrete and that tex2html_wrap_inline416 is the price discretization interval):

  equation41

   figure54
Figure 1: Sample profit landscape for seller 1 in Price-Quality model, as a function of seller 1 price tex2html_wrap_inline320 and seller 2 price tex2html_wrap_inline322 .

We also note in passing that there are similar profit landscapes for each of the sellers in the Information-Filtering model and in the Shopbot model. In all three models, it is the existence of multiple, disconnected peaks in the landscapes, with relative heights that can change depending on the other seller's price, that leads to price wars when the sellers behave myopically.

Regarding the information set that is made available to the sellers, we have made a simplifying assumption as a first step that the players have essentially perfect information. They can model the consumer behavior perfectly, and they also have perfect knowledge of each other's costs and profit functions. Hence our model is thus a two-player perfect-information deterministic game that is very similar to games like chess. The main differences are that the profits in our model are not strictly zero-sum, and that there are no terminating or absorbing nodes in our model's state space. Also in our model, payoffs are given to the players at every time step, whereas in games such as chess, payoffs are only given at the terminating nodes.

As mentioned previously, we constrain the prices set by the two sellers to lie in a range from some minimum to maximum allowable price. The prices are discretized, so that one can create lookup tables for the seller profit functions tex2html_wrap_inline428 . Furthermore, the optimal pricing policies for each seller as a function of the other seller's price, tex2html_wrap_inline430 and tex2html_wrap_inline432 , can also be represented in the form of table lookups.


next up previous
Next: Single-agent Q-learning Up: Pricing in agent economies Previous: Introduction

kephart
Wed Sep 29 11:53:06 EDT 1999