next up previous
Next: Summary of the utility Up: Foresight-based pricing algorithms in Previous: Foresight-based pricing algorithms in

Introduction

In prior work (Kephart et al., 1998; Sairamesh and Kephart, 1998) it has been shown that in large-scale economies of software agents, the potential exists for unending cycles of disastrous competitive ``wars'' in price/product space. Such price and niche wars are disastrous not only for the sellers in such economies (e.g. information sellers in an information filtering economy), but they can also be disastrous for the consumers as well. Price wars have the potential to be more rampant in agent economies than what we have observed in human economies due to a number of differences between agent and human economic players. Among such differences are: (1) greater ability of humans to predict long-term consequences of their price-setting actions; (2) reduced frictional effects such as consumer inertia in agent economies; and (3) reduced localization effects due to much greater connectivity offered by the Internet.

In this paper, we focus specifically on the use of foresight or predictive capability by software agents to anticipate retaliatory responses that will be made by other agents. Our intuition is that this is the most important factor in models studied so far that generate price wars. In (Kephart et al., 1998), it was shown that price-war phenomena are generally obtained in agent economies when the agents can do a good job of optimizing their immediate short-term reward or utility, but have no capability to predict system behavior beyond the current time step. Such ``myopic optimal'' agents may be tempted to engage in price-undercutting behavior to obtain an immediate short-term advantage. However, if the other agents retaliate by further undercutting, then the agent may be worse off in the long term than if it had kept its price at a high level and not attempted any undercutting. Our intuition is that if agents can be endowed with some sort of predictive capability that could anticipate longer-term consequences of current pricing behavior, then this could allow the agents to realize the futility of undercutting. Implementing foresight mechanisms on a large scale throughout an agent economy might then lead to radically different system behavior, and price wars might be greatly reduced or eliminated.

In considering algorithmic approaches to agent foresight, one set of issues that arises has to do with which type of algorithms will enable agents to make the most accurate predictions, given the constraints of a limited information set and limited computational resources. An additional set of issues has to do with what sort of collective system consequences result when a significant fraction of the agents base their actions on predictions. Will a society of machine learners converge to something like a game-theoretic solution (for example, a Nash equilibrium), or will they just endlessly chase one another's tails? The former question has been studied, for example, in (Milgrom and Roberts, 1991) and (Foster and Vohra, 1997), while the latter issues are beginning to be addressed, for example, in (Hu and Wellman, 1996) and (Vidal and Durfee, 1998).

In the present work, we are primarily concerned with issues of the depth and accuracy of agent lookahead. That is to say, how far ahead in time can an agent reliably predict those aspects of the future system behavior that are relevant to their current decisions, and how much lookahead is necessary to avoid pathological behavior? Is shallow lookahead sufficient, or is it necessary for agents to engage in deep lookahead in order to avoid price wars? Likewise, we would like to know how accurately an agent can predict, and how much accuracy is needed to avoid pathological behavior? It may be the case that only a coarse prediction is needed to avoid price wars. (For example, will any other agent set a price lower than mine on the next time step?)

Our proposed algorithms for agent foresight are designed to avoid the classic problem of infinite recursion of opponent models. That is to say, when modeling other agents, one needs to take into account the fact that those other agents are themselves using models of other agents, and that those models need to take into account that the other agents are using models, etc.. This can lead not only to logical problems in setting up the agent models, but also to greatly increasing levels of computational complexity with the depth of recursion. For example, in the work of (Vidal and Durfee, 1998), a recursive modeling scheme is proposed in which 0-level agents do no opponent modeling, 1-level agents model the other agents as being 0-level agents, 2-level agents model the other agents as being 1-level agents, etc.. In this scheme, the computational requirements greatly increase with the level of modeling, and furthermore, there is no adequate way for an agent to model other agents as being at the same level of depth.

We consider two basic heuristic approaches to avoiding an infinite recursion of opponent models. The first approach is adapted from the domain of two-player zero-sum games such as chess, in which full-width minimax search to a fixed finite depth has been found to be an effective algorithm. In this case, the infinite recursion is cut off by the finite depth of the search. Minimax search is only guaranteed to find optimal moves if the search goes all the way to the end of the game; however, it does seem to work well in practice for searches of lesser depth. For example, the chess machines Deep Thought and Deep Blue give the impression of generating sophisiticated positional understanding as an emergent property from deep searches plus simple positional knowledge built into the evaluation function.

The second approach that we explore is to adapt algorithms from the fields of Dynamic Programming (DP) and Reinforcement Learning (RL); such algorithms have been found to work well for single agents in stationary Markov environments (i.e. Markov Decision Problems, or MDPs). The basic idea of DP/RL is ``Policy Iteration'' (Bertsekas, 1995), in which one starts with an initial policy, computes the value function induced by that policy, and then computes an improved policy that is greedy with respect to that value function. Policy iteration is guaranteed to converge to the optimal agent policy for single-agent MDPs. Recently there has been some work generalizing DP-type algorithms to two-player Markov games. For example, (Littman, 1994) introduced an algorithm called minimax-Q for two-player zero-sum games in which the players alternately take turns moving, which is guaranteed to converge to the optimal policies for both players. Unfortunately, we can't directly use minimax-Q in agent economies because the agent utilities are not strictly zero-sum. We investigate in this paper several heuristic DP-like approaches which can be used in arbitrary-sum games.

As a general caveat, we should point out that both classes of algorithms mentioned above seek deterministic optimal policies. (Here ``optimal'' is used in the game-theoretic sense of the best worst-case behavior against all possible opponent strategies.) However, it may the case that such deterministic policies do not exist. For example, in the game of rock-paper-scissors, any deterministic policy can be defeated, and the best policies are non-deterministic and cannot be computed by these methods.


next up previous
Next: Summary of the utility Up: Foresight-based pricing algorithms in Previous: Foresight-based pricing algorithms in

Jeff Kephart
Sat Aug 15 01:14:48 EDT 1998