We have constructed a very simplified and restricted two-seller economy in which the instantaneous utilities of the two sellers are given either by the price-quality model of (Sairamesh and Kephart, 1998), or by the of the information-filtering model of (Kephart et al., 1998). In both models, cyclic price wars are obtained when the sellers myopically optimize their instantaneous utilities without regard to longer-term impact of their pricing policies. By limiting the system to two sellers with fixed product differentiation (i.e. quality or information category), and by modeling the consumers as giving instantaneous, deterministic responses to the current prices of the two sellers, we have essentially created a two-player, alternating-turn, arbitrary-sum Markov game. In this game, both the state-space transitions and the rewards at each time step are deterministic; furthermore, the state space is fully observable and the exact utility functions for both players are known by each player. Finally, the discretization of the seller prices implies that lookup table representation of the utilities and optimal prices are feasible.
Under the above conditions, we have developed two types of procedures for progressively calculating optimal price tables based on n-step lookahead. Our main finding is that, at least for the two specific models studied here, it was possible to significantly ameliorate the myopic price-war dynamics without resorting to deep searches. In both models, utilizing lookahead depths as small as n = 2 was sufficient in at least some cases to damp out or eliminate price wars. These results confirmed our intuition that the ability to anticipate retaliatory responses can be an important mechanism in avoiding price wars, and provide some degree of hope that similar findings might be obtained in more realistic scenarios of larger numbers of agents, and less than perfect information available to the agents. (If foresight-based algorithms turned out not to work in the simplest case, there would be no reason to expect them to work in more realistic cases.) It will be of great interest to understand the conditions under which this will be the case, and when mutual prediction of any depth will have little or no impact on systemic price wars, or possibly even make them worse.
Within the context of our current simulations, one important area of ongoing and future research is further elaborating the conditions under which the proposed algorithms converge to invariant optimal price curves. Several important challenges will also be faced in extending the algorithms for agent foresight to larger-scale, more realistic simulations. In the work reported here, the state space was small and perfectly known, so that one could hope to develop an algorithm that could calculate in advance something like a game-theoretic optimal pricing algorithm for each agent. When there are a large number of sellers in the simulation, the seller utilities and pricing functions will have such high input dimensionality that it will be infeasible to use lookup table state-space representations, and most likely some sort of compact representation combined with a function approximation scheme will be necessary. Furthermore, with many sellers, the concept of sellers taking turns adjusting their prices in a well-defined order becomes problematic. This could lead to an additional combinatorial explosion, if the lookahead procedure has to anticipate all possible orderings of opponent responses. Finally, although the sellers may know quite a lot about the other sellers' costs and rewards, it seems unrealistic to hope that sellers could have full knowledge of the other sellers' utility functions.
A further set of issues that would be of interest to address has to do with what agents might do when they observe that the actual behavior of other agents does not match with predicted behavior. One could argue that this might not be of paramount importance. As long as the predicted behaviors lead to the elimination of price wars, perhaps any further refinement in prediction accuracy would not lead to significantly increased rewards. Nevertheless, if one observes other agents behaving in a way that one believes to be suboptimal, it might be worth investigating whether such apparent suboptimal behavior might possibly be exploited by some sort of inductive modification of the agent's predictions. A number of important theoretical challenges must be faced in trying to develop such algorithms. First, there is the issue of generalization: given that we have observed the opponent behaving in a certain way at one point in state space, what can we then infer about how that opponent will behave in other parts of the state space? Second, there is the issue of exploration: it may be necessary for the agent to make suboptimal moves in order to reach new areas of state space, simply to gather more information about the opponents' behaviors and strategies. Third, there are additional challenges if one believes that the opponents' strategies are non-stationary; this is presumably much more difficult than modeling a fixed strategy. Finally, there are a host of gamemanship-type issues that could arise; for example, if we observe an opponent behaving suboptimally, it may be a trick -- the opponent may be trying to lead us to develop an incorrect model which can then be exploited at later times.