next up previous
Next: Introduction Up: Pricing in agent economies
 
To appear in Proceedings of Workshop ABS-3: Learning About, From and With other Agents (held in conjunction with IJCAI '99), August 2, 1999, Stockholm

Keywords: reinforcement learning, neural networks, adaptive multi-agent systems, agent economies

Pricing in agent economies using neural networks and multi-agent Q-learning

Gerald Tesauro
IBM T. J. Watson Research Center
30 Saw Mill River Rd., Hawthorne NY, 10532
e-mail: tesauro@watson.ibm.com

Abstract:

This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace. For a single adaptive agent facing fixed-strategy opponents, ordinary Q-learning is guaranteed to find the optimal policy. However, for a population of agents each trying to adapt in the presence of other adaptive agents, the problem becomes non-stationary and history dependent, and it is not known whether any global convergence will be obtained, and if so, whether such solutions will be optimal. This paper studies simultaneous Q-learning by two competing seller agents in three moderately realistic economic models. This is the simplest case in which interesting multi-agent phenomena can occur, and the state space is small enough so that lookup tables can be used to represent the Q-functions. Despite the lack of theoretical guarantees, simultaneous convergence to self-consistent optimal solutions is obtained in each model, at least for small values of the discount parameter. In some cases, such convergence is also found even at large discount parameters. Furthermore, the Q-derived policies increase profitability and damp out or eliminate cyclic price ``wars'' compared to simpler policies based on zero lookahed or short-term lookahead. The use of function approximators (neural nets) instead of lookup tables is also investigated; preliminary findings indicate that reasonably good policies can be obtained even though the absolute accuracy of the function approximation may be poor.





kephart
Wed Sep 29 11:51:48 EDT 1999