next up previous
Next: References Up: Multi-agent Q-learning and regression Previous: Results

Conclusions

We have studied a simple combination of tree-based function approximation with single-agent and multi-agent Q-learning, and have found highly encouraging results in three different models of agent economies, each with a different mechanism for generating price wars. This suggests that the success of the algorithm is not due to specific details of any single model, and it is likely to be more generally applicable in agent economies.

The main contributions of this work are twofold. First, an important open question in Reinforcement Learning research is how to combine RL methods such as Q-learning with nonlinear function approximation. Most of the empirical research on this topic has involved neural networks. Our work, along with (Wang and Dietterich, 1999), is among the first to demonstrate success in combining Q-learning with regression trees. Our results show clear advantages over the neural net approach, and thus provide impetus for further studies of tree-based methods.

The second contribution is that with regression trees, Q-learning appears much more feasible as a practical approach to learning strategies in multi-agent economies. With lookup tables, Q-learning generally performed well in terms of training time and policy profitability, both in the single-agent and multi-agent cases, but is expected to scale poorly with the number of agents. Regression trees are expected to offer much better scaling. We find that they train quickly, achieve profits equal or very close to the lookup table policies, and offer significant advantages in terms of tree size and amount of training data required.

One clearly important direction of future research is to examine how well regression trees perform in economic models with more agents and more realistic details. (Preliminary results indicate that in a three-seller case, regression trees still seem to perform well with a manageable increase in the size of the trees.) It also seems likely that more sophisticated methods for generating splits and leaf-node function approximation, such as oblique splits and linear function approximation, could be of clear benefit. It will also be important to resolve how to adapt our batch training methodology to online learning, which is inherently incremental.

Finally, we have observed empirical behavior of multi-agent Q-learning, such as approximate but not exact convergence, that has no analog in ordinary Q-learning, where the Bellman equation is known to be a contraction mapping, leading to a unique global attractor with zero residual error. The empirical results here, together with those of (Sandholm and Crites, 1995) for Prisoner's Dilemma, suggest that more interesting theoretical results may be available in the multi-agent case. Depending on the payoff functions and on tex2html_wrap_inline235 , there may be a single global attractor, or multiple basins of attraction, or no attractor dynamics at all. Also, the attractors may be small finite regions or limit cycles rather than points, with non-zero asymptotic residual error.


next up previous
Next: References Up: Multi-agent Q-learning and regression Previous: Results

kephart
Tue Mar 21 00:52:15 EST 2000