next up previous
Next: Simulation Method Up: Automated Strategy Searches in Previous: Summary of Results for

Incomplete Information: Simulations of Agent Behavior

The analytical results we have just presented identify which price schedule a producer should choose given that it has perfect information about the distribution of consumer preferences. In this setting, a profit-maximizing producer will set prices to maximally exploit consumers. However, a real-life producer seldom knows everything about consumer preferences. With incomplete information, each period of pricing and consumer purchasing may reveal information that enables the producer to update and improve his estimate of the consumer preference distribution. Generally, the more accurate the producer's estimate, the higher his one-period profits will be.

Therefore, pricing decisions now serve two functions: exploitation and exploration. In general, the prices that extract maximal expected profit from consumer in the current period (given the producer's current beliefs) will not provide the maximal improvement in the estimate of the preference distribution for use in future pricing decisions. Thus, there will be a tradeoff between exploitation and exploration. For this reason, we now explore how producers might learn about preferences through a dynamic sequence of price-purchase interactions with consumers. In particular, we examine how much profit is accumulated over time when the producer follows different learning methods on price schedules of varying complexity. Different exploitation-exploration strategies will tend to converge at different rates, but also will yield different accumulations of profit during the learning phase.

One important consideration is how much the producer knows about the structure of preferences. In practice, a producer will not know the exact functional form of the consumers' preference model(s). In one extreme case, if the producer has no prior knowledge of the structure of preferences (other than that demand is decreasing in price), it will have to adjust the parameters of its pricing schedule based on the apparent correlation between prices and the profit signals it receives.

Whether the producer is trying to learn the structure of consumer preferences (to thereby derive the profit landscape) or to learn the profit landscape (or even just its peak) directly, there are any number of potential learning algorithms. In this paper we experiment with two off-the-shelf learning systems: a neural network trained using Quickprop[Fah88] and a direct search method for finding the optimum of a multidimensional function known as amoeba.

Quickprop is a variant of the backpropagation method for training feedforward neural networks. In addition to the traditional gradient information used to minimize error, Quickprop uses second derivative information to increase the rate of convergence, as well as a momentum term to carry the net through plateaus in the space of network weights.

Amoeba is described in Numerical Recipes[Pre92]. It is a variant of the simplex algorithm [NM65] for nonlinear unconstrained optimization problems. (This simplex algorithm should not be confused with the simplex algorithm for linear programming.) The amoeba algorithm maintains at each iteration a nondegenerate simplex, which is a geometric figure in n dimensions of nonzero volume that is the convex hull of n+1 vertices, tex2html_wrap_inline827 , and their respective function values. In each iteration, new points are computed, along with their function values, to form a new simplex. The algorithm terminates when the function values at the vertices of the simplex satisfy some predetermined condition. For a detailed survey on amoeba refer to  [WPMD91].

In a competitive environment, of course, a producer might find it advantageous to develop a learning algorithm customized to the problem at hand. Our goal is not to discover the best learning algorithm, but to understand the tradeoffs between learning and exploitation that occur across ranges of pricing structures, and to compare these results using two common algorithms.

Learning can be confounded if the consumer population evolves (through exit and/or entry), or if consumer preferences change, especially if a producer is only able to sample the consumer population infrequently (for example, based on a journal that is issued quarterly). In ``moving target'' multiagent learning problems[VD98b] such as this, some amount of residual error in what is learned is unavoidable. For such problems, we expect that strategies which learn faster, albeit ultimately less accurately or completely, may be favored. Likewise, strategies that perform well despite errors in the estimation of preferences should be favored. We hypothesize that pricing schedules with fewer parameters can generally be learned more quickly. It is less clear whether there is a systematic relationship between price schedule complexity and the robustness of its performance while learning. To explore these questions, we have designed experiments that assess learning speed and robustness while learning for a variety of learning methods and pricing structures. We measure both the convergence rate and the average profit per agent per period prior to convergence.




next up previous
Next: Simulation Method Up: Automated Strategy Searches in Previous: Summary of Results for

kephart
Sat Oct 23 00:54:56 EDT 1999