We have constructed a very simplified and restricted
two-seller economy in which the instantaneous utilities of the
two sellers are given either by the price-quality model of
(Sairamesh and Kephart, 1998), or by the
of the information-filtering model of (Kephart et al., 1998).
In both models, cyclic price wars are obtained when the
sellers myopically optimize their instantaneous utilities
without regard to longer-term impact of their pricing policies.
By limiting the system to two sellers with fixed product
differentiation (i.e. quality or information category),
and by modeling the consumers as giving instantaneous,
deterministic responses to the current prices of the two sellers,
we have essentially created a two-player, alternating-turn,
arbitrary-sum Markov game. In this game, both the state-space
transitions and the rewards at each time step are deterministic;
furthermore, the state space is fully observable and the exact utility
functions for both players are known by each player. Finally,
the discretization of the seller prices implies that lookup
table representation of the utilities and optimal prices
are feasible.
Under the above conditions, we have developed two types of
procedures for progressively calculating optimal price tables
based on n-step lookahead. Our main finding is that, at least
for the two specific models studied here, it was possible to
significantly ameliorate the myopic price-war dynamics without
resorting to deep searches. In both models, utilizing
lookahead depths as small as n = 2 was sufficient in at
least some cases to damp out or eliminate price wars.
These results confirmed our intuition that the ability to
anticipate retaliatory responses can be an important mechanism
in avoiding price wars, and provide some degree of hope that
similar findings might be obtained in more realistic scenarios
of larger numbers of agents, and less than perfect information
available to the agents. (If foresight-based algorithms turned
out not to work in the simplest case, there would be no reason to expect
them to work in more realistic cases.) It will be of great interest to
understand the conditions under which this will be the case, and
when mutual prediction of any depth will have little or no impact
on systemic price wars, or possibly even make them worse.
Within the context of our current simulations, one important
area of ongoing and future research is further elaborating
the conditions under which the proposed algorithms converge
to invariant optimal price curves.
Several important challenges will also be faced in extending
the algorithms for agent foresight to larger-scale, more
realistic simulations. In the work reported here, the state
space was small and perfectly known, so that one could hope
to develop an algorithm that could
calculate in advance something like a game-theoretic
optimal pricing algorithm for each agent. When there are a
large number of sellers in the simulation, the seller utilities
and pricing functions will have such high input dimensionality
that it will be infeasible to use
lookup table state-space representations, and most likely
some sort of compact representation combined with a
function approximation scheme will be necessary.
Furthermore, with many sellers, the concept of sellers
taking turns adjusting their prices in a well-defined order
becomes problematic. This could lead to an additional
combinatorial explosion, if the lookahead procedure has to
anticipate all possible orderings of opponent responses.
Finally, although the sellers may
know quite a lot about the other sellers' costs and rewards,
it seems unrealistic to hope that sellers could have full
knowledge of the other sellers' utility functions.
A further set of issues that would be of interest to address
has to do with what agents might do when they observe that
the actual behavior of other agents does not match with
predicted behavior. One could argue that this might not be
of paramount importance. As long as the predicted behaviors
lead to the elimination of price wars, perhaps any further
refinement in prediction accuracy would not lead to significantly
increased rewards. Nevertheless, if one observes other agents
behaving in a way that one believes to be suboptimal, it might
be worth investigating whether such apparent suboptimal behavior
might possibly be exploited by some sort of inductive
modification of the agent's predictions. A number of important
theoretical challenges must be faced in trying to develop
such algorithms. First, there is the issue of generalization:
given that we have observed the opponent behaving in a certain
way at one point in state space, what can we then infer about
how that opponent will behave in other parts of the state space?
Second, there is the issue of exploration: it may be necessary
for the agent to make suboptimal moves in order to reach new
areas of state space, simply to gather more information about
the opponents' behaviors and strategies. Third, there are
additional challenges if one believes that the opponents'
strategies are non-stationary; this is presumably much more
difficult than modeling a fixed strategy. Finally, there are
a host of gamemanship-type issues that could arise;
for example, if we observe
an opponent behaving suboptimally, it may be a trick -- the
opponent may be trying to lead us to develop an incorrect
model which can then be exploited at later times.