next up previous
Next: Q-learning with regression trees Up: Single and Multi-agent Q-learning Previous: Results of lookup table

Difficulties of neural network training

Some preliminary results of combining Q-learning with neural networks were reported in (Tesauro, 1999). The neural nets typically appeared to reach peak profitability in a few hundred sweeps through the training cases (corresponding to a few hours of CPU time). The policies were reasonably good at this point, and qualitatively similar to the lookup table policies, but the quality of approximation of the Q-function was poor, as indicated by large Bellman error. With much further training (out to several days of CPU time), the Bellman error improved significantly, but there was no improvement in policy profitability. It is possible that, with enough additional training, further improvements in profitability might be found, but it appeared that the required training times would be prohibitively long.



kephart
Tue Mar 21 00:52:15 EST 2000