next up previous
Next: Single and Multi-agent Q-learning Up: Multi-agent Q-learning and regression Previous: Introduction

Model agent economies

Our models make a number of simplifying assumptions relative to the likely complexities of real agent economies. The economy is restricted to two sellers, competing on the basis of price, who offer similar or identical products to a large population of consumer agents. Prices are discretized and lie between a minimum and maximum price; there are typically tex2html_wrap_inline249 possible prices. This renders the state space small enough to use lookup tables to represent the agents' pricing policies and expected profits. Time is also discretized; at each time step, the consumers compare the current prices of the sellers, and instantaneously and deterministically choose to buy from at most one seller. Hence at each time step, for each possible pair of prices, there is a deterministic profit obtained by each seller.

We also assume that the sellers alternately take turns adjusting their prices, rather than simultaneously setting prices. Alternating-turn dynamics is motivated by two considerations: (a) It ensures that there will be a deterministic optimal policy (Littman, 1994), and hence normal Q-learning, which yields deterministic policies, can apply. (b) In a realistic many-seller economy, it seems reasonable to assume that the sellers will adjust their prices at different times rather than at the same time (although probably not in a well-defined order).

The three economic models studied here are described in detail elsewhere gif. In the first model, called the ``Price-Quality'' model (Sairamesh and Kephart, 1998), the sellers' products have different values of a scalar ``quality'' parameter, with higher-quality products being perceived as more valuable. At each time step, the consumers buy the lowest-priced product subject to constraints of a maximum allowable price and a minimum allowable quality. The substitutability of seller products enables direct price competition, and the ``vertical'' differentiation of differing quality values leads to asymmetries in the sellers' profit functions. Such asymmetries can result in unending cyclic price wars when the sellers employ myoptimal pricing strategies.

The second model, described in (Kephart, Hanson and Sairamesh, 1998), is an ``Information-Filtering'' model in which the two sellers offer news articles in partly overlapping categories. This model contains a ``horizontal'' differentiation of article categories. To the extent that the categories overlap, there can be direct price competition, and to the extent that they differ, there are asymmetries that again lead to the potential for cyclic price wars.

The third model is the ``Shopbot'' model described in (Greenwald and Kephart, 1999), which models the situation on the Internet in which some consumers use a shopbot to compare prices of all sellers offering a given product, and select the lowest-priced seller. In this model, the sellers' products are identical, and their profit functions are symmetric. Myoptimal pricing leads the sellers to undercut each other until the minimum price point is reached. At that point, a new price war cycle can be launched, due to asymmetric buyer behavior, rather than seller asymmetries. Some buyers choose a random seller rather than bargain-hunt with the shopbot; this makes it profitable to abandon the low-price competition, and instead maximally exploit the random buyers by charging the maximum possible price.

   figure11
Figure 1: Sample profit landscape for seller 1 in Price-Quality model, as a function of seller 1 price tex2html_wrap_inline231 and seller 2 price tex2html_wrap_inline233 .

An example seller profit function, taken from the Price-Quality model, is plotted in figure 1. This shows the instantaneous profit for seller 1, tex2html_wrap_inline255 . The quality parameters are tex2html_wrap_inline257 , tex2html_wrap_inline259 (i.e. seller 1 is the higher-quality seller). The myoptimal policy for seller 1, tex2html_wrap_inline261 , is obtained for each value of tex2html_wrap_inline233 by sweeping across all values of tex2html_wrap_inline231 and choosing the value with the highest profit. For small tex2html_wrap_inline233 , the peak profit is obtained at tex2html_wrap_inline269 , whereas for larger tex2html_wrap_inline233 , there is eventually a discontinuous shift to the other peak, which follows the parabolic-shaped ridge along the diagonal.

The Information-Filtering and Shopbot models also have similar profit landscapes. In all three models, it is the existence of multiple, disconnected peaks in the landscapes, with varying relative heights depending on the other seller's price, that leads to price wars when the sellers behave myopically.

In these models it is assumed for simplicity that the players have essentially perfect information. They can model the consumer behavior perfectly, and they also have perfect knowledge of each other's costs and profit functions. Hence the model is in essence a two-player perfect-information deterministic game, similar to games like chess. The main differences are that the payoffs are not strictly zero-sum, there are no terminating nodes in the state space, and payoffs are given to the players at every time step.


next up previous
Next: Single and Multi-agent Q-learning Up: Multi-agent Q-learning and regression Previous: Introduction

kephart
Tue Mar 21 00:52:15 EST 2000