Jeffrey O. Kephart and Gerald J. Tesauro
IBM Institute for Advanced Commerce
IBM Thomas J. Watson Research Center
P.O. Box 704, Yorktown Heights, NY 10598, USA
kephart@us.ibm.com, tesauro@watson.ibm.com
We study novel aspects of multi-agent Q-learning in a model market in which two identical, competing ``pricebots'' strategically price a commodity. Two fundamentally different solutions are observed: an exact, stationary solution with zero Bellman error consisting of symmetric policies, and a non-stationary, broken-symmetry pseudo-solution, with small but non-zero Bellman error. This ``pseudo-convergent'' asymmetric solution has no analog in ordinary Q-learning. We calculate analytically the form of both solutions, and map out numerically the conditions under which each occurs. We suggest that this observed behavior will also be found more generally in other studies of multi-agent Q-learning, and discuss implications and directions for future research.