next up previous
Next: Single-agent Q-learning Up: Pricing in agent economies Previous: Introduction

Model agent economies

Real agent economies are likely to contain large numbers of agents, with complex details of how the agents behave and interact with each other on multiple time scales. In order to make initial progress, a number of simplifying assumptions are made. The economy is restricted to two competing sellers, offering similar or identical products to a large population of consumer agents. The sellers compete on the basis of price, and it is assumed that prices are discretized and can lie between a minimum and maximum price, such that the number of possible prices is at most a few hundred. This renders the state space small enough that it is feasible to use lookup tables to represent the agents' pricing policies and expected utilities. Time in the simulation is also discretized; at each time step, the consumers compare the current prices of the two sellers, and instantaneously and deterministically choose to purchase from at most one seller. Hence at each time step, for each possible pair of seller prices, there is a deterministic reward or utility given to each seller. The simulation can iterate forever, and there may or may not be a discounting factor for the present value of future rewards.

It is worth noting that the consumers are not regarded as ``players'' in the model. The consumers have no strategic role: they behave according to an extremely simple, fixed, short-term greedy rule (buy the lowest priced product at each time step), and are regarded as merely providing a stationary environment in which the two sellers can compete in a two-player game. This is clearly a simplifying first step in the study of multi-agent phenomena, and in future work, the models will be extended to include strategic and adaptive behavior on the part of the consumers as well. This will change the notion of ``desirable'' system behavior. In the present model, desirable behavior would resemble ``collusion'' between the two sellers in charging very high prices, so that both could obtain high profits. Obviously this is not desirable from the consumers' viewpoint.

Regarding the dynamics of seller price adjustments, it is assumed that the sellers alternately take turns adjusting their prices, rather than simultaneously setting prices (i.e. the game is extensive form rather than normal form). The choice of alternating-turn dynamics is motivated by two considerations: (a) As the number of sellers becomes large and the model becomes more realistic, it seems more reasonable to assume that the sellers will adjust their prices at different times rather than at the same time, although they probably will not take turns in a well-defined order. (b) With alternating-turn dynamics, one can stay within the normal Q-learning framework where the Q-function implies a deterministic optimal policy: it is known that in two-player alternating turn games, there always exists a deterministic policy that is as good as any non-deterministic policy (Littman, 1994). In contrast, in games with simultaneous moves (for example, rock-paper-scissors), it is possible that no deterministic policy is optimal, and that the existing Q-learning formalism for MDPs would have to be modified and extended so that it could yield non-deterministic optimal policies.

Q-learning is studied in three different economic models that have been described in detail elsewhere Sairamesh and Kephart, 1998; Kephart, Hanson and Sairamesh, 1998; Greenwald and Kephart, 1999). The first model, called the ``Price-Quality'' model (Sairamesh and Kephart, 1998), models the sellers' products as being distinguished by different values of a scalar ``quality'' parameter, with higher-quality products being perceived as more valuable by the consumers. The consumers are modeled as trying to obtain the lowest-priced product at each time step, subject to threshold-type constraints on both quality and price, i.e., each consumer has a maximum allowable price and a minimum allowable quality. The similarity and substitutability of seller products leads to a potential for direct price competition; however, the ``vertical'' differentiation due to differening quality values leads to an asymmetry in the sellers' utility functions. It is believed that this asymmetry is responsible for the unending cyclic price wars that emerge when the sellers employ myoptimal pricing strategies.

The second model is an ``Information-Filtering'' model described in detail in (Kephart, Hanson and Sairamesh, 1998). In this model there are two competing sellers of news articles in somewhat overlapping categories. In contrast to the vertical differentiation of the Price-Quality model, this model contains a horizontal differentiation in the differing article categories. To the extent that the categories overlap, there can be direct price competition, and to the extent that they differ, there are asymmetries introduced that again lead to the potential for cyclic price wars.

The third model is the so-called ``Shopbot'' model described in (Greenwald and Kephart, 1999), which is intended to model the situation on the Internet in which some consumers may use a Shopbot to compare prices of all sellers offering a given product, and select the seller with the lowest price. In this model, the sellers' products are exactly identical and the utility functions are symmetric. Myoptimal pricing leads the sellers to undercut each other until the minimum price point is reached. At that point, a new price war cycle can be launched, due to buyer asymmetries rather than seller asymmetries. The fact that not all buyers use the Shopbot, and some buyers instead choose a seller at random, means that it can be profitable for a seller to abandon the low-price competition for the bargain hunters, and instead maximally exploit the random buyers by charging the maximum possible price.

An example economic utility function, taken from the price-quality model, is as follows: Let tex2html_wrap_inline439 and tex2html_wrap_inline441 represent the prices charged by seller 1 and seller 2 respectively. Let tex2html_wrap_inline475 and tex2html_wrap_inline477 represent their respective quality parameters, with tex2html_wrap_inline479 . Let c(q) represent the cost to a seller of producing an item of quality q. Then assuming the particular model of consumer behavior described in [9], one can show analytically that in the limit of infinitely many consumers, the instantaneous utilities (profits per consumer) tex2html_wrap_inline485 and tex2html_wrap_inline487 obtained by seller 1 and seller 2 respectively are given by:

  equation15

  equation23

A plot of the utility landscape for seller 1 as a function of prices tex2html_wrap_inline439 and tex2html_wrap_inline441 is given in figure 1, for the following parameter settings: tex2html_wrap_inline503 , tex2html_wrap_inline505 , and c(q) = 0.1 (1+q). (These specific parameter settings were chosen because they are known to generate harmful price wars when the agents use myopic optimal pricing.) We can see in this figure that the myopic optimal price for seller 1 as a function of seller 2's price, tex2html_wrap_inline509 , is obtained for each value of tex2html_wrap_inline441 by sweeping across all values of tex2html_wrap_inline439 and choosing the value that gives the highest utility. We can see that for small values of tex2html_wrap_inline441 , the peak utility is obtained at tex2html_wrap_inline517 , whereas for larger values of tex2html_wrap_inline441 , there is eventually a discontinuous shift to the other peak, which follows along the parabolic-shaped ridge in the landscape. An analytic expression for the myopic optimal price for seller 1 as a function of tex2html_wrap_inline441 is as follows (defining tex2html_wrap_inline523 and tex2html_wrap_inline525 ):

  equation32

Similarly, the myopic optimal price for seller 2 as a function of the price set by seller 1, tex2html_wrap_inline533 , is given by the following formula (assuming that prices are discrete and that tex2html_wrap_inline535 is the price discretization interval):

  equation45

   figure58
Figure 1: Sample utility landscape for seller 1 in price-quality model, as a function of seller 1 price tex2html_wrap_inline439 and seller 2 price tex2html_wrap_inline441 .

There are also similar utility landscapes for each seller in the Information-Filtering model and in the Shopbot model. In all three models, it is the existence of multiple, disconnected peaks in the landscapes, with relative heights that can change depending on the other seller's price, that leads to price wars when the sellers behave myopically.

In these models it is assumed for simplicity that the players have essentially perfect information. They can model the consumer behavior perfectly, and they also have perfect knowledge of each other's costs and utility functions. Hence the model is in essence a two-player perfect-information deterministic game, similar to games like chess. The main differences are that the utilities are not strictly zero-sum, and that there are no terminating or absorbing nodes in the state space. Also, payoffs are given to the players at every time step, whereas in games such as chess, payoffs are only given at the terminating nodes.

As mentioned previously, the possible seller prices are constrained to lie in a range from some minimum to maximum allowable price. The prices are discretized, so that one can create lookup tables for the seller utility functions tex2html_wrap_inline547 . Furthermore, the optimal pricing policies for each seller as a function of the other seller's price, tex2html_wrap_inline549 and tex2html_wrap_inline551 , can also be represented in the form of table lookups.


next up previous
Next: Single-agent Q-learning Up: Pricing in agent economies Previous: Introduction

kephart
Wed Sep 29 11:51:48 EDT 1999