1. Introduction
A greedy algorithm for solving an optimization problem successively maximizes the variables in some chosen order. Greedy algorithms are widely used in many contexts as a heuristic, with a hope and a prayer (sometimes even a proof) that the rapid calculation will yield an acceptable answer. In this essay, however, we are interested specifically in linear programming problems and in cases where the greedy algorithms actually produce correct answers.
While such cases are rare, they have been influential in linear programming history. A sympathetic interpretation of the 1781 paper by Monge [1] allows us to interpret a passage in that remarkable pamphlet as the description of a linear programming problem (actually of the important subclass known as the transportation problem [2]) which is amenable to a greedy solution. In fact, this notion was eponymized in [3] to celebrate its pedigree. A survey of the impact of the Monge concept is given in [4]. We provide a technical description later.
A second influence has been the use of the idea of greedy algorithms for linear programming to give a general context to some famous combinatorial algorithms. A leading example is the introduction by Edmonds [5] of the concept of polymatroid, and the interplay between greedy algorithms on polymatroids with greedy algorithms on matroids (which, in turn, generalize some algorithms for constructing minimum spanning trees for a graph). We should also mention in this spirit the notion of (0, 1) matrices in which the 1's in each row (or in each column) occur consecutively. Although introduced [6] in the first place to provide a greedy (and correct) solution to the caterer problem [7], it became an essential ingredient in many arguments used to prove combinatorial theorems using linear programming duality.
There is, unfortunately, no survey of the subject of greedy algorithms in linear programming more recent than [8], which is more than 15 years old. Our aim in this paper is to review an interesting development in greedy theory in a series of related papers appearing in the last few years, to describe the common themes, to offer generalizations of two of these themes, and to comment on weaknesses or lacunae remaining in our treatment. Central to this work being summarized and extended are the concepts (to be defined below) of partially ordered set and submodular functions.
2. Definitions; products of chains
A partially ordered set P is a set of elements {a, b, c, ... }, together with a binary relation “<” on some pairs such that a < b, b < c imply a < c, and a < a is never true. Also, a > b means b < a. We say that a and b are comparable if a = b, a < b, or b < a. A chain is a partially ordered set in which any two elements are comparable. P is said to be a lattice if, for every pair of elements a, b, there is a unique least upper bound a b in P satisfying
c ≥ a, c ≥ b imply c ≥ a b,
| (1a) |
and there is a unique greatest lower bound a b in P satisfying
c ≤ a, c ≤ b imply c ≤ a b.
| (1b) |
We later introduce the more general notion of pseudolattice, but we postpone this to Section 3. In the meantime, we frequently use L to denote the partially ordered set with a lattice structure.
A real function r defined on a lattice L is said to be
modular if r(a b) + r(a b) = r(a) + r(b);
| (2a) |
submodular if r(a b) + r(a b) ≤ r(a) + r(b);
(2b) |
supermodular if r(a b) + r(a b) ≥ r(a) + r(b).
(2c) |
Let k chains C1, C2, ... , Ck be given, with respective cardinalities c1, c2, ... , ck. There is a natural partial ordering on the Cartesian product C1 × ... × Ck that yields a lattice. A submodular function r( ) on this lattice is called a Monge array in case k = 2, and we use the same term for general k. Now consider the linear programming problem arising as follows: Construct a (0, 1) matrix A with c1 c2 ... ck rows and c1 + c2 + ... + ck columns defined as follows: The rows correspond to the Cartesian product of the chains, and the columns to the union of the chains, with a 1 in a particular row and column if the element corresponding to that column is one of the multiplicands in the Cartesian product corresponding to the row. Let d be a given nonnegative vector with c1 + ... + ck coordinates corresponding to the columns of A, with r( ) being a submodular function (Monge array) on the product lattice C1 × ... × Ck corresponding to the rows of A. Our linear programming problem is the following: Choose a nonnegative y that satisfies y'A = d', and of all such y, choose one that minimizes (y, r). It is shown in [9] that the following greedy algorithm solves the problem: Let t be the row of A corresponding to the greatest element (say t) of the product lattice, the least entry of d corresponding to a 1 in t. Let yt = , replace d with d  t, delete from A the column containing , and continue. This greedy algorithm terminates with an optimum solution, or the news that the problem is not feasible. (This news would be using up the rows of A without attaining a feasible solution.)
Next, consider the following variation of the polymatroid. Let r be a submodular function defined on the lattice of subsets of a finite set U, with r( ) = 0. Let A be the (0, 1) matrix whose rows are the incidence vectors of the subsets of U, d a nonnegative vector with coordinates corresponding to elements of U. Consider the linear programming problem y'A = d, y ≥ 0, minimize (y, r). Again, a greedy algorithm starting with the greatest element in the lattice—U itself—solves the optimization problem.
In [10] it was pointed out that the validity of the greedy algorithm for polymatroids can be shown to be a consequence of the validity of the greedy chain-product algorithm described above. (Among other interesting items in [10] is a proof that the chain-product algorithm is valid over any sublattice of the chain-product, and an allusion to the theorem of Topkis about submodular functions on chain-products; we comment on both later.)
The reasoning in [10] is that, in the polymatroid problem, we can replace each element u of U with a two-element chain. The greater element u' corresponds to including the element u in a subset; the lesser element u" corresponds to omitting the element. This is how the subsets of U are coded. Of course, we need to choose values of the d vector to correspond to the different u", but it is easy to see how this is accomplished. Thus, [10] succeeds in folding this class of polymatroids into the multi-dimensional Monge array.
Further work [11–13] on topics related to those in [10] is discussed later.
3. Modular, consecutive (0, 1) functions
In this section, we present an alternative approach to the greedy algorithms discussed in Section 2.
Let U be a finite set and L a lattice, with least element m and greatest element M. We are given functions f:L → 2U and r:L → . We assume that
if f(a) = , then a = m and r(m) = 0.
| (3) |
For each u ∈U, let f(L, u) be defined by
f(a, u) = 1 if u ∈f(a), 0 if u f(a).
| (4) |
We assume that the function f satisfies the following:
for all u ∈U, f(L, u) is modular,
and
|
for all a ≠ b, f(a) ≠ f(b).
| (5) |
In addition to the given functions f and r, we are also given a nonnegative function d:U → , and we consider the following linear programming problem:
| (6) |
where
x(a) ≥ 0 for all a ∈L,
and
For this problem, we propose a greedy algorithm GL. In the first step, we set x(M) to be the minimum of all d(u) such that u ∈f(M), and we let u* be one of the (possibly several) us where the minimum is attained. If we delete from L all elements a such that u* ∈f(a), subtract d(u*) from all d(u) such that u ∈f(M), and delete u* from U, it is observed from (3), (4), and (5) that the remaining elements form a lattice, and these same stipulations still hold for the remaining lattice and remaining U, on which d is still a nonnegative function. We continue this algorithm indefinitely, until we are compelled to stop. We may stop because the remaining d still has at least one positive coordinate, but the remaining part of L is empty or consists just of m with f(m) = 0, or we may stop because d has been reduced to 0.
Theorem 1
Assume that (6) is feasible. Assume also that (3), (4), and (5) hold. Then GL produces an optimum solution to (6) if the following additional conditions hold:
|
f(a) ⊃ f(b) implies a > b,
| (8) |
|
a < b < c, f(a, u) = f(c, u) = 1 imply f(b, u) = 1.
| (9) |
The reader should note that the consecutivity condition (9) from [6] is significant in lattice polyhedra generally (see [8, 14–16]). What is different here from general lattice polyhedra is (8), which specifies relations among the various f(L, u) for different u, whereas we are usually concerned just with properties of f(L, u) for each u by itself. But assumption (8) is unavoidable: In fact, if any one of (7), (8), or (9) is false, there is an instance of (6) in which the greedy algorithm GL does not produce an optimum solution.
Proof
To prove the theorem, all we need prove is the following: There exists a feasible solution x* to (6) for which
The reason is that induction will then justify every step of GL.
By the hypothesis of the theorem, (6) is feasible. Further, the feasible region is bounded, so an optimum solution to (6) exists. Let x be an optimum solution in which x(M) is as large as possible, but x(M) is less than d(u*). This means that, for each u ∈f(M), there exists at least one a ∈L, a < M, such that
|
x(a) > 0 and u ∈f(a).
| (11) |
Let a be a maximal element of the partially ordered set L satisfying (11). By (8) and (5), there is at least one u ∈f(M) (say u'), that is not in f(a). Thus, there exists at least one element b ∈L, b ≠ a, so that
u' ∈f(b), u' f(a).
| (12) |
Suppose that b and a are comparable. Then, by the definition of a, b < a, and we would have, with M > a > b, and (12), a violation of (9). Hence, a and b are incomparable. Let be a positive number not greater than the minimum of x(b) and x(a). Reduce x(a) and x(b) by , and raise x(a b) and x(a b) by . By (5), the new x is feasible; by (7), the new x is optimal. If M = a b, we have contradicted the definition of x. If a b ≠ M, we still have a b > a, and have contradicted the definition of a. Hence, we have proved (10) and the theorem.
It remains to show that each of (7), (8), and (9) is needed. The necessity of (7) is obvious, and indeed was pointed out in [17]. For (9), let L be the chain a < b < c, U = {1, 2, 3}, f(a) = {1, 2}, f(b) = {1, 3}, and f(c) = {2, 3}. Let d(1) = d(2) = d(3) = 1, and let r(a) = r(b) = r(c) = 1. Then (7) and (8) are true, (9) is not, and GL does not produce an optimal solution; it does not even produce a feasible solution, though one exists.
For (8), let L, U, r, and d be as above. Let f(a) = {1, 2}, f(b) = {1, 2, 3}, f(c) = {3}. Let r(a) = r(b) = r(c) = 1. Then GL does not produce an optimal solution, even though (7) and (9) are true.
Observe that the theorem clearly covers the case of polymatroids. It also covers the case of chain-products, as we see in the next section. Besides its conclusions, our theorem has the feature of divorcing (partly, but not totally) the lattice L from the set U. In the other examples, and others to be cited later, U comes from the definition of L. However, our separation is not complete because of conditions (5) and (8).
4. Products of lattices
In this section we consider, more generally than the product of chains in [10], the product of lattices and the extension of the results in Section 3 to such products.
Theorem 2
Let L1, ... , Lk be lattices; U1, ... , Uk disjoint sets; fi:Li → Ui, i = 1, ... , k, each satisfying the conditions in Section 3. If we define the product lattice L ≡ L1 × ... × Lk ≡ {(a1, ... , ak):ai ∈Li, i = 1, ... , k}, with a ≤ b if and only if ai ≤ bi for all i; and we define
f ≡ (f1, …, fk) → U = U1 ∪ … ∪ Uk
by
f(a1, …, ak) = f(a1) ∪ … ∪ f(ak),
we will have satisfied the conditions of Section 3 for f, L, and U.
If, in addition, we introduce a submodular r:L → R, we have extended the material on chain-products in [10] to lattice products. It is interesting to observe, however, some facts about submodular r defined on lattice products.
Topkis [18] proved that for a product C = C1 × ... × Ck of chains, r is submodular on C if
|
r(a ∨ b) + r(a ∧ b) ≤ r(a) + r(b)
| (13) |
holds when a = (a1, ..., ak) and
|
b = (b1, …, bk) differ in at most two coordinates.
| (14) |
We will prove Theorem 3.
Theorem 3
Let L = L, × ... × L, r:L → . Let a = (a1, ... , ak) and b = (b1, ... , bk). Assume that (13) holds in the following cases:
For each i, ai and bi are comparable, and
|
ai = bi with two exceptions.
| (15) |
and
|
ai = bi with one exception.
| (16) |
Then r: L → is submodular.
Note that if L is the product of chains, (16) is immediate, and (15) is the same as (14). We think it interesting that proving r submodular on L needs only (15) when L is restricted to be the product of two element chains.
To prove that (15) and (16) imply (13), we first consider the case k = 2. Assume a = (a1, a2), b = (b1, b2), and define
c1 = (a1 ∨ b1), d1 = (a1 ∧ b1);
c2 = (a2 ∨ b2), d2 = (a2 ∧ b2);
By (16),
|
r(a1, c2) + r(a1, d2) ≤ r(a1, a2) + r(a1, b2),
r(b1, c2) + r(b1, d2) ≤ r(b1, a2) + r(b1, b2),
r(c1, a2) + r(d1, a2) ≤ r(a1, a2) + r(b1, a2),
r(c1, b2) + r(d1, b2) ≤ r(a1, b2) + r(b1, b2),
| (17) |
By (15),
|
r(c1, c2) + r(a1, b2) ≤ r(c1, b2) + r(a1, c2),
r(c1, c2) + r(b1, a2) ≤ r(c1, a2) + r(b1, c2),
r(d1, d2) + r(a1, b2) ≤ r(d1, b2) + r(a1, d2),
r(d1, d2) + r(b1, a2) ≤ r(d1, a2) + r(b1, d2),
| (18) |
Adding one half of (17) and (18) yields (13).
We now consider the case of general k ≥ 3, assuming (inductively) that the theorem has been proved for k 1. Let us consider L = L1 × ... × Lk as L = L*1 × Lk, where L*1 = L1 × ... × Lk-1. This brings us back to the case of two lattices, but we must verify the hypothesis.
First, for fixed ak, r is a submodular function on L*1 by induction. For fixed a ∈L*1, r is a submodular function on Lk by the original hypothesis. Next, (15) holds, from the theorem of Topkis. Thus, the induction step is complete.
5. Pseudolattices
In [11–13], further examples in the spirit of [10] were presented. To explain them, we first define the following: In a partially ordered set P, we say that b covers a if a < b and there is no c such that a < c < b. Next, an ideal of P is a subset such that a ∈ , b < a imply b ∈ . Observe that the union (and also the intersection) of two ideals is an ideal. An antichain P is a set of mutually incomparable elements. If is an antichain, then [ ] denotes the ideal of all b ∈P such that b ≤ a for some a ∈ . If is an ideal, + is the antichain consisting of the maximal elements of .
Let a partially ordered set P be given, and consider the lattice L(P) of all ideals contained in P. [L(P) is itself a partially ordered set, ordered by inclusion, with union and intersection respectively the operations and .] Let be the set of elements of P, and define a mapping f:L(P) → 2U by
f( ) = +.
| (19) |
It is shown in [11] that, when P is a forest, a suitable generalization of the greedy algorithm described in [10] is valid. Note that, when P is a forest, the mapping (19) satisfies the hypothesis (and conclusion) of Theorem 1 above. If P is not a forest, then the greedy algorithm is not in general valid. But [12] points out that an additional hypothesis about the submodular function r:L(P) → , if e is covered by at least two elements, then
r( ∪ {e}) ≥ r( )
| (20) |
is sufficient to justify the greedy algorithm even when P is not a forest. An interpretation of this result in terms of submodular flows is given in [13]. The key idea in both papers to establish that, if 1 and 2 are ideals in P, then the antichain
≡ f( 1) + f( 2) f( 1 ∪ 2) is an antichain,
| (21) |
and
Further, every element in
( 1 ∩ 2) [ ] is covered by at least two elements.
| (23) |
From (20), (22), and (23), we infer that r([ ]) ≤ r( 1 ∩ 2), hence
r( 1 ∪ 2) + r([ ]) ≤ r( 1) + r( 2).
| (24) |
Instead of the arguments used in [11] and [12], we propose to include the result of [11] in the framework of our approach in Theorem 1 by slightly extending the definitions used therein. Note that Theorem 1, as it stands, cannot encompass [12] because (5) is false. On the other hand, if one wants to exploit (20)–(24), we have the difficulty that, in (20), [ ] is not 1 2.
Our tactic is to extend Theorem 1 by using the notion of pseudolattice. A partially ordered set P is a pseudolattice if, for every pair of elements (a, b) there is a designated element a * b satisfying a * b ≥ a, b, and there is a designated element a * b satisfying a * b ≤ a, b.
Further, if a and b are comparable, a * b = max(a, b), a * b = min(a, b).
Theorem 4
If, in Theorem 1, the term pseudolattice and the pseudolattice operations * and * are substituted for the term lattice and the lattice operations and , then the theorem remains valid.
The proof is simply to revisit the proof of Theorem 1 and observe that it remains valid in the more general setting without changing a single word. Further, with this change, we accommodate [12] by observing that when we define
1 ∨* 2 ≡ 1 ∪ 2
and
where is defined by (21), then (5) holds along with all of the other hypotheses of Theorem 4, so [12] is included in the general setting.
6. Some questions
-
It is easy to see that in the partially ordered set m < a, b, c < M with {a, b, c} mutually incomparable, no function f satisfying our hypotheses is possible. It would be nice to know what partially ordered sets permit functions f.
-
A theory for combining greedy linear programming problems on polytopes of the form
{x| x ≤ b, x ≥ 0} is presented in [17]. We have no appropriate theory if the polytopes are of the form {x| x = b, x ≥ 0}.
Acknowledgments
We thank Don Coppersmith, Milind Dawande, Jon Lee, and George Nemhauser for their contributions to this paper.
Received February 20, 2002;
accepted
for publication August 29, 2002; Internet publication December 9, 2002 |