next up previous
Next: Agent Behavior With Complete Up: Automated Strategy Searches in Previous: Automated Strategy Searches in

Introduction

Electronic goods are very flexible. In contrast to physical goods, marginal costs are negligible and nearly limitless bundling and unbundling of these items are possible. Consequently, producers can offer complex pricing and bundling schemes that would be infeasible for traditional commerce in physical goods. Considering only pricing structures that are based on the number of items in a bundle, and not on the identity of the items, there are families of such pricing functions with one free parameter, two parameters, and so forth. In the limit, the most general pricing function for this problem has N parameters, where N is the total number of different information goods under consideration. Within each family are many possible functional forms (e.g., piecewise linear, polynomial, etc.), and different profits can be obtained with different functions drawn from the same family.

Therefore, producers of electronic information goods have a daunting challenge in determining how to explore the space of all possible bundles and price schemes to find the optimal combination. Even though the space to be searched is quite large, search over and experimentation with product bundling and price structures are feasible in an agent-mediated economy, due to lower transaction costs. Our goals in this paper are thus two-fold: to learn something about designing economically-intelligent agents, and to learn about the consequences of interactions between today's not-so-economically- intelligent agents as they search for the best bundle/price niche.

It would seem reasonable to assume that a producer which has more free parameters to control in pricing will be more profitable, since it will be able to fit the consumer demand curve more accurately. What then is there to learn? Why not always use unrestricted nonlinear pricing (a different, unconstrained price for every bundle size)? It turns out that optimal pricing under more complex schemes requires more knowledge about consumer preferences than simple pricing schemes require. Learning about consumer preferences takes time; meanwhile, the firm is earning less than the optimal profit. Furthermore, a complex price schedule may be more difficult to explain to consumers and more difficult for consumers to evaluate so as to determine their best response to these prices. If these costs are relatively high compared to the additional profit the complex scheme could potentially provide, then a producer is likely to settle on a simpler pricing strategy.

Uncertainty about consumer preferences affects many producer decisions. Early papers in the economics literature studied how agents optimally choose between competing opportunities of unknown reward, often referred to as multi-armed bandit problems [Wei79]. Agents weigh the tradeoff of gaining information by experimenting versus the cost of experimentation (such as foregone short-run profits). The tradeoff between exploitation and exploration and the problem of how to determine the optimal sequence of actions over a period of time is a primary focus of the reinforcement learning literature. [SB98] provides an excellent overview of this area. Related to our paper, some authors have studied how a firm chooses a one-parameter linear price when it faces uncertain consumer preferences[Rot74]. One author shows that with incomplete learning, the optimal linear price may never be reached[GLS84].

There is also an extensive economic literature on how a firm can use multi-parameter pricing schedules to extract greater surplus when the distribution of consumers is known, but individual identities are not (or the firm is not allowed to tailor individual-dependent prices). See [Wil93] for a thorough overview. [MR84] present a method for deriving the most profitable unconstrained nonlinear pricing scheme when consumers are differentiated by a single taste parameter. Most papers on related topics assume that the distribution of consumer valuations is known by the firm. One exception studied the tradeoffs between maximizing current profits (exploitation) and charging lower prices in early periods to learn more about the consumer population[BO94].

Multiagent learning has become a popular research topic; [Wei99] contains a chapter summarizing recent work. [VD98a] examines the problem of modeling other agents and discusses conditions under which this sort of modeling is useful. A similar problem is examined in [CM96], where opponent strategies in repeated games are represented as a finite state machine. [HW98] discusses the problem of multiagent learning in a general-sum game and shows how agents are able to learn a Nash equilibrium strategy.

In this paper we use analytic methods to derive optimal prices under pricing schemes of varying complexity for a model with complete information. We measure the increase in profits as more parameters are controlled by a monopolist producer. We show that the majority of the gains take place as we move from 1 to 2 parameters. Simulations are used to explore a dynamic model in which the monopolist is uncertain about consumer valuations and thus learns the optimal prices gradually and perhaps imperfectly. The analytical solutions provide a benchmark for the maximum profits that could be attained by the firm in steady-state. The simulations provide a means of measuring the costs of a more complex scheme. As the complexity of a pricing schedule increases, it takes longer to learn, but in some cases, particularly that of a two-part tariff, the transitional profits outperform those of the simpler pricing schemes, due to the shape of the profit landscape. We also see that schedules with the same number of parameters may perform differently, despite having identical steady-state profits. We find that the choice of learning methods and exploration strategy strongly affect the speed of convergence, but more importantly for our purpose, affect the magnitude of foregone profits during the learning period.


next up previous
Next: Agent Behavior With Complete Up: Automated Strategy Searches in Previous: Automated Strategy Searches in

kephart
Sat Oct 23 00:54:56 EDT 1999