next up previous
Next: GTMY, and DF Up: Strategic Pricebot Dynamics Previous: Analysis

Pricebot Strategies

 

When sufficiently widespread adoption of shopbots by buyers forces sellers to become more competitive, it is likely that sellers will respond by creating pricebots that automatically set prices in attempt to maximize profitability. It seems unrealistic, however, to expect that pricebots will simply compute a Nash equilibrium and fix prices accordingly. The real business world is fraught with uncertainties, undermining the validity of traditional game-theoretic analyses: sellers lack perfect knowledge of buyer demands, and have an incomplete understanding of competitors' strategies. In order to be deemed profitable, pricebots will need to learn from and adapt to changing market conditions.

We now introduce four pricebot strategies, each of which places different requirements on the type and amount of information available to the agent and upon the agent's computational power.

GT
The game-theoretic strategy is designed to reproduce the mixed strategy Nash equilibrium that was computed in the previous section. It makes use of full information about the buyer population, and it assumes its competitors utilize game-theoretic pricing as well.

GT is a constant function since it makes no use of historical observations. Nonetheless, it is of interest in our simulation studies in part because there exist learning algorithms that converge to stage game-theoretic equilibria over repeated play (see Foster and Vohra [6] and Greenwald [8]).

MY
The myopically optimal, or myoptimal, gif pricing strategy (see, for example,  [11]) uses information about all the buyer characteristics that factor into the buyer demand function, as well as competitors' prices, but makes no attempt to account for competitors' pricing strategies. Instead, it is based on the assumption of static expectations: even if one seller is contemplating a price change under myoptimal pricing, this seller does not assume that this will elicit a response from its competitors; it assumes that competitors' prices will remain fixed.

The myoptimal seller s uses all available information and the assumption of static expectations to perform an exhaustive search for the price tex2html_wrap_inline989 that maximizes its expected profit tex2html_wrap_inline709 . The computational demands can be reduced greatly if the price quantum tex2html_wrap_inline993 (the smallest amount by which one seller may undercut another) is sufficiently small. Under such circumstances, the optimal price tex2html_wrap_inline989 is guaranteed to be either the monopolistic price tex2html_wrap_inline845 or tex2html_wrap_inline993 below some competitor's price, limiting the search for tex2html_wrap_inline989 to S possible values. In our simulations, we choose tex2html_wrap_inline1005 .

DF
The derivative-following strategy is less informationally intensive than either the myoptimal or the game-theoretic pricing strategies. In particular, this strategy can be used in the absence of any knowledge or assumptions about one's competitors or the buyer demand function. A derivative follower simply experiments with incremental increases (or decreases) in price, continuing to move its price in the same direction until the observed profitability level falls, at which point the direction of movement is reversed. The price increment tex2html_wrap_inline1007 is chosen randomly from a specified probability distribution; in the simulations described here the distribution was uniform between 0.01 and 0.02.

Q
The Q-learning price-setting strategy is based on a reinforcement learning procedure called Q-learning [18], which can learn optimal pricing policies for Markov Decision Problems (MDPs). It does so by learning the function Q(x,a) representing the cumulative discounted payoff of taking action a in state x. The discounted payoff is expressed as tex2html_wrap_inline1015 , where tex2html_wrap_inline1017 is the expected reward n time steps in the future, and tex2html_wrap_inline1021 is a constant ``discount parameter'' lying between 0 and 1. The optimal policy for a given state is the action that maximizes the Q-function for that state. Q-learning yields a deterministic policy, and is therefore unable to represent equilibrium play in games where the equilibria are composed solely of randomized strategies. Q-learning finds the optimal policy in cases where the Q-learner's opponents use stationary Markovian strategies. Situations that deviate from this, such as history-dependent opponents or non-stationary learning opponents (e.g., another Q-learner), constitute an interesting and open research topic that is touched upon here and in some of our prior work (see Tesauro and Kephart [15]).


next up previous
Next: GTMY, and DF Up: Strategic Pricebot Dynamics Previous: Analysis

kephart
Tue Sep 28 21:57:17 EDT 1999