next up previous
Next: References Up: Strategic Pricebot Dynamics Previous: Q vs. Q

Conclusions and Future Work

 

This paper examined several approaches to the design and implementation of pricebot algorithms that differ in their informational and computational requirements, and further differ in terms of their merits and limitations when tested against themselves and one another. The simplest algorithm is derivative following (DF) -- its price adjustments are trivial to compute, and it requires no knowledge of either the buyer responses or of the other sellers' prices. DF is a reasonable approach in the absence of such knowledge, and performs well when other sellers also use DF. If more knowledge is available, however, other approaches can be designed to yield greater profits. If expected buyer demand is known, it is often possible to compute a game-theoretic equilibrium solution (GT). While this strategy is perhaps appropriate for one-shot games, in iterated games where there is knowledge of other sellers' prices, it is possible to outperform GT. The myoptimal strategy (MY), for example, uses knowledge of expected buyer demand to compute a short-term optimal price (but ignores the expected long-term behavior of its competitors) and outperforms both GT and DF. Computation of MY would appear to require an exhaustive search over all possible prices; however, MY's computational overhead can be reduced to examining only tex2html_wrap_inline993 -undercuts of the other sellers' prices and the monopolistic price, for sufficiently small values of tex2html_wrap_inline993 .

Pricing based on Q-learning is superior to myoptimal pricing since Q anticipates the longer-term consequences of its actions resulting both from competitor responses and its own counter-responses. Thus, among the pricing strategies studied, Q appears to offer the most promising performance; however, its computational requirements are rather substantial. This is true for both off-line and on-line Q-learning. For off-line Q, an extensive simulation is required, including models of both the buyers' demands and the other sellers' strategies. For on-line Q, real competitor prices and buyer responses are used, so that models are not required, but the training process remains long and tedious, and the learner must endure true losses during the training period while exploring suboptimal actions. In future work, we plan to continue investigating adaptive learning approaches such as Q-learning, with an aim towards designing algorithms that are feasible for practical implementation. For example, the Q-function can be represented using function approximators rather than lookup tables. Preliminary (and encouraging) results have been obtained based on neural networks [14], and work based on decision trees is underway. Lastly, while knowledge of competitors' prices seems reasonable (by the very existence of shopbots), accurate models of competitors' strategies or buyer demand may in fact be unavailable in actual agent economies; methods to overcome such limitations remain open to investigation.


next up previous
Next: References Up: Strategic Pricebot Dynamics Previous: Q vs. Q

kephart
Tue Sep 28 21:57:17 EDT 1999