next up previous
Next: Q vs. GT Up: Strategic Pricebot Dynamics Previous: Consumer Surplus

Q-Learning Simulations

 

The MY, DF, and GT pricing strategies studied in the previous section are all predefined strategies that do not vary during a simulation run. In this section, we study Q-learning, which incorporates a training period during which time Q pricebots adapt to specific opponent strategies. Simulation results of Q-learning against each of the 4 pricebot strategies (including Q-learning itself) are presented below. Due to the lookup table representation of the Q-function, these simulations were limited to 2 pricebots. In future work, we plan to study Q-learning in the case of multiple pricebots using function approximators (e.g., neural networks and decision trees) rather than lookup tables to represent the Q-functions. Details of our Q-learning methodology are presented in Tesauro and Kephart [15].



kephart
Tue Sep 28 21:57:17 EDT 1999