Influence diagrams are a powerful formalism for reasoning with sequential decision making problems under uncertainty. They involve both chance variables, where the outcome is determined randomly based on the values assigned to other variables, and decision variables, which the decision maker can choose the value of, based on observations of some other variables.

Uncertainty is represented (like in a Bayesian network) by a collection of conditional probability distributions, one for each chance variable. The overall value of an outcome is represented as the sum of a collection of utility functions.

In many situations, a decision maker has more than one objective, and mapping several objectives to a single utility scale can be problematic, since the decision maker may be unwilling or unable to provide precise tradeoffs between objectives.

In this project we consider multi-objective influence diagrams, where utility values are now vectors in Rp, with p being the number of objectives. Since utility values are now only partially ordered (for instance by the Pareto ordering) we no longer have a unique maximal value of expected utility, but a set of them.

We also define a simple formalism for imprecise tradeoffs; this allows the decision maker, during the elicitation stage, to specify a preference for one multi-objective utility vector over another, and uses such inputs to infer other preferences.

The induced preference relation then is used to eliminate dominated utility vectors during the computation. Our experimental results indicate that the presence of even a few such imprecise tradeoffs greatly reduces the undominated set of expected utility values.

Challenge
Solution
Benefits

## Example of results

Figure 1 shows the influence diagram of the oil wildcatter problem. An oil wildcatter must decide either to drill or not to drill for oil at a specific site. Before drilling, a seismic test could help determine the geological structure of the site.

The test results 1 can show a closed reflection pattern (indication of significant oil), an open pattern (indication of some oil), or a diffuse pattern (almost no hope of oil). The special value notest is used if no seismic test is performed.

There are therefore two decision variables, T (test) and D (drill), and two chance variables S (seismic results) and O (oil contents). The probabilistic knowledge consists of the conditional probability tables P(O) and P(S|O, T ), while the utility function is the sum of U1(T) and U2(O,D). The optimal policy is to perform the test and to drill only if the test results show an open or a closed pattern.

We consider the bi-objective version of the decision problem where the utility function with two attributes representing the testing/drilling payoff and damage to the environment, respectively. The utility of testing could be for example (−10, 10), whereas the utility of drilling would be (−70, 18), (50, 12), (200, 8) for a dry, wet and soaking hole, respectively.

The aim is to find optimal policies that maximize the payoff and minimize the damage to environment. The dominance relation is defined in this case by ~u  ~v , u1  v1 and u2  v2 (e.g., (10, 2)  (8, 4) and (10, 2)  (8, 1)). Assuming approriate probability values, the Pareto set max{EU | policies } contains four elements, i.e., {(22.5, 17.56), (20, 14.2), (11, 12.78), (0, 0)}, corresponding to the four optimal policies.

Figure 1. A simple influence diagram with two decisions.

The Pareto ordering is rather weak (often leading to very large Pareto optimal sets. Very often the decision maker will be happy to allow some trade-offs between objectives.

For example, in a two-objective situation, they may tell us that are happy to gain 3 units of the first objective at the cost of losing one unit of the second, and hence prefer (3,−1) to (0, 0).

Such tradeoffs may be elicited using some structured method, or in a more ad hoc way. For instance the decision maker may be asked to imagine a scenario where it was known that the oil content was “wet”, and whether they would then have a preference for drilling. To do so would imply a preference of (50, 12) over (0, 0).

## Contribution

In this project we develop efficient variable elimination as well as search based algorithms for computing optimal decision policies in the presence of imprecise tradeoffs between the objectives of the decision problem. The preference relation induced by the tradeoffs is used to eliminate dominated utility vectors during the computation. Through extensive empirical evaluations we show that the presence of even a few such imprecise tradeoffs greatly reduces the undominated set of expected utility values corresponding to optimal policies.

Figure 2 shows the distribution (mean, standard deviation and median) of the size of the maximal sets generated (i.e., optimal policies sets) by our algorithm on a class of decision problems involving 25 chance variables, five decisions and five objectives, as a function of the number of pairwise tradeoffs (K) between the objectives.

As more tradeoffs become available, the number of problem instances solved increases because the dominance relation becomes stronger, which therefore reduces the undominated utility sets significantly.

Figure 2: Distribution (mean, stdev, median) of the size of maximal utility sets as a function of pairwise tradeoffs K and fixed 3-way tradeoffs (T = 3). We also plot the number of instances solved out of 100. Time limit 20 minutes.

## References

A. Botea, E. Daly, A. Kishimoto, and R. Marinescu
Multi-Criteria Journey Aware Housing Recommender System.
In Proceedings of the ACMInternational Conference on Recommender Systems (RecSys), 2014.

R. Marinescu, R. Abdul and N.Wilson