Here we explore the conditions that determine whether
the symmetric or asymmetric solution is obtained.
First, we start by adding gaussian noise,
with mean 0 and amplitude
, to the Q functions
of the symmetric solution.
We then allow Q-learning to run, and observe whether the
symmetric or asymmetric solution is obtained.
An example illustrating the initial perturbed policies
at
and
is shown in Fig.
8; this particular initial
condition happened to evolve to the symmetric solution.
Figure 8: Example initial perturbed policies obtained
by starting from the symmetric solution at
and adding noise with amplitude
.
Final response functions are symmetric; most other trials
at this noise amplitude reached
the asymmetric solution.
As the noise amplitude is increased, the initial policies
tend to be further from the symmetric solution, and the
Q functions tend to evolve more often towards the asymmetric solution.
For 16 different choices of noise amplitude ranging from
0.01 to 50.00, 100 trials were conducted. At
,
the noise is so slight that the policies are usually unchanged.
At
, the initial response functions are essentially random.
The percentage of trials that yielded the
symmetric solution as a function of the noise
amplitude for
and
is given by Fig. 9(a).
The same data can be viewed in a different way,
by plotting the probability of obtaining the symmetric solution
as a function of the total Manhattan distance of the two noisy
initial response curves from the ideal noise-free symmetric solution.
This is shown in Fig. 9(b).
(The probability is obtained by averaging over 100 trials centered
around each distance.)
In addition to the randomness of the starting state, the random exploration dynamics of Q-learning also influences the resulting final state. We have performed experiments starting many trials from a specific random starting state, and found that some trials converged to the symmetric solution while other trials went to the asymmetric solution.
Fig. 9
supports the conceptual interpretation of two-player Q-learning
dynamics in terms of a basin of attraction around the symmetric
solution, delineated by a distance parameter in either policy
space or in Q-function space. In both spaces, there is
a small region around the symmetric solution such that
virtually any starting state within the region will invariably
converge to the symmetric solution. For low
, the symmetric
solution can be reached even when the starting state is well outside
this region--roughly a 30% chance when
. For moderate
to high
, the symmetric solution cannot be reached if the
starting state lies beyond this region. This explains why we only
observed the asymmetric solution for moderate
to high
in our earlier studies [7].
Figure 9: Probability of obtaining the symmetric
solution, starting from randomly perturbed initial
Q-functions, at
and
. a) As a function of noise
amplitude
. b) As a function of
total Manhattan distance of initial policies from the
ideal symmetric solution.