Monotone Data Flow Analysis Frameworks
Monotone Data Flow Analysis Frameworks
Performing compile time optimization requires solving a class of problems, called Global data flow analysis problems (abbreviated as gdfap's), involving determination of information which is distributed throughout the program.
Monotone data flow analysis frameworks are used to analyze the flow of information through a program. They are based on the idea of propagating data flow information through the control flow graph (CFG) of the program. The analysis is called "monotone" because the information gathered at each point in the program is monotonically increasing(or decreasing) as the analysis progresses.
Lattice Theoretic
Definition: A flow graph is a triple , where:
- is a finite set of nodes.
- is a subset of called the edges. The edge enters node y and leaves node . We say that is a predecessor of , and a successor of .
- in is the initial node. There is a path from to every node.
Definition: A semilattice is a set with a binary meet operation such that for all :
Definition: Given a semilattice and elements, , we say that
also means and means . We extend the notation of the meet operation to arbitrary finite sets by saying
Definition: A semilattice is said to have a bottom element , if for all $ x \in L, \bot \wedge x = \bot$. is said to have a top element , if for all . We assume from here on that every semilattice has a bottom element, but not necessarily a top element.
Definition: Given a semilattice , a sequence of elements in forms a chain if $ x_i > x_{i-1}$ for . is said to be bounded if for each there is a constant such that each chain beginning with has length at most .
If is bounded, then we can take meets over countably infinite sets if we define , where , to be . The fact that is bounded assures us there is an integer such that .
Partial Orders and Greatest Lower Bounds
As we shall see, the meet operator of a semilattice define a partial order on the values of the domain. A relation is a on a set if :
- (the partial order is ).
- If and , then (the partial order is ).
- If and , then (the partial order is ).
The pair is called a , or . It is also convenient to have a relation for a post, defined as
Suppose is a semilattice. A (or ) of domain elements and is an element such that
- ,
- , and
- If is any element such that .
It turns out that the meet of and is their only greatest lower bound. To see why, let . Observe that:
- because .
- by a similar argument.
- Suppose is any element such that and . We claim , and therefore, cannot be a glb of x and y unless it is also . In proof: . Since , we know , so . Since , we know , and therefore . We have proven and conclude is the only glb of and .
Monotone Data Flow Analysis Frameworks
Definition: Given a bounded semilattice , a set of functions on is said to be a if the following conditions are satisfied:
[M1] Each satisfies the condition,
[M2] There exists an identify function in , such that
[M3] is closed under composition, i.e. , where
[M4] is equal to the closure of under the meet operation and application of functions in .
. Given a semilattice , let be a function on , then
. We shall assume and show that holds. Since is the greatest lower bound of and , we know that
Thus, by ,
Since is the greatest lower bound of and , we have .
Conversely, let us assume to prove . We suppose and use to conclude . Since is assumed, , by definition. We know that
Since is the of and , we know . Thus
implies .
. For any bounded semilattice and any countable set , if for all we have , then .
. Consider the meet of all elements in :
The meet is the greatest lower bound of all elements in . By definition, for all . If for all , then every element in is an upper bound for . , being the greatest lower bound of all , must also be a lower bound for . Therefore, . Hence, .
Definition: A Monotone data flow analysis framework is a triple , where
- is a bounded semilattice with meet .
- is a monotone function space associated with .
A particular instance of a monotone data flow analysis framework is a pair , where
- is a flow graph.
- is a function which maps each node in to a function in .
Some monotone data flow analysis frameworks satisfy the condition:
That is, each in is a homomorphism on . There are many interesting gdfap's which are monotone data flow analysis frameworks but which do not satisfy the distributivity property. The following are some examples.
Constant Propagation can be formalized as a monotone data flow analysis framework . Here , where is an infinite set of variables and is the set of all real numbers.
is the set of functions from finite subsets of to .
is the function which is undefined for all .
The meet operation on is set intersection.
Intuitively, stands for the information about variables which we may assume at certain points of the program flow graph. implies the variable A has value .
We define a notation for functions in based on the sequence of assignments whose effect they are to model.
- There are functions denoted and in , for each , and in , and .
Let . Then
, where for all ; is undefined unless and for some and in R, in which case .
Assume that represents the semilattice before that the program execution flow enters function , and the semilattice of the program execution flow after executing . Since is an assignment statement, so no matter what the variable changes, we always know all other variables will not be affected because the only variable affected is . Hence we can obtain for all . However, we still can determine the value of if and for some and in R, in which case .
where for all and .
- , where for all .
- if then .
Lemma 1. Let be a semilattice and be functions on . If it is true that , then .
. (by assumption). Suppose , then (by Observation 1). (by assumption). So by simple backward induction on , the lemma follows.
Theorem 1. is a monotone data flow analysis framework. Furthermore there exists and such that .
. The fact that is a bounded semilattice with a element is obvious. Furthermore, for any element for some integer , where is of the form . So to show that is a monotone function space associated with , it suffices by Lemma 1, to show that for all and all functions in of the form or ,
and
Observe that since is intersection on , the relation is set inclusion.
Suppose we are given and . Let . Then for all , if then and . Hence and .
If is undefined in , then we are done. Suppose however, that . Then is a subset of and is also a subset of , for some and such that . This implies that and
Suppose we are given and . It is straightforward to show that . Hence the first part of the lemma follows.
For a counterexample showing that is not distributive, consider the flow chart of Figure 1. There we see that , while .
□
We shall also mention that Theorem 1 can be generalized to any framework whose lattice elements associate "values" with variables, whose meet operation is intersection, and whose functions reflect the application of "operation" on those values and assignment of values to variable. The framework will be monotone in all cases, but will be distributive only if the interpretation of the operator is "free", that is, the effect of applying operator to two different of values is never that same.
The phrase "whose functions reflect the application of 'operation' on those values and assignment of values to variable" refers to how the transfer functions in the data flow analysis framework operate on the lattice elements. Specifically, it describes the behavior of these functions in terms of:
- Applying operations to the values associated with variables (e.g., performing arithmetic or logical operations).
- Assigning new values to variables based on the results of those operations.
This means that the transfer functions simulate the effect of program statements on the values of variables. For example:
- If a program statement is
x = y + z
, the corresponding transfer function:
- Applies the operation to the values of and .
- Assigns the result to .
In the lattice, this might look this:
- If is mapped to 3 and is mapped to 5, the transfer function updates to 8.
Approaches to Solving Monotone Data Flow Analysis Problems
It appears generally true that what one searches for in a data flow problem is what we shall call the solution. That is, let denote the set of paths from the initial node to node in some flow graph. The we really want for each . It is this function, the solution that, in any practical data flow problem we can think of, expression the desired information.
The paths considered in the solution are a superset of all the paths that are possibly executed. Thus, the solution meets together not only the data-flow values of all the executable paths, but also additional values associated with the paths that cannot possibly by executed.
Notice that in the solution, the number of paths considered is still unbounded if the flow graph contains cycles. Thus, the definition does not lend itself to a direct algorithm. The iterative algorithm certainly does not first find all the paths leading to a basic block before applying the meet operator. Rather,
- The iterative algorithm visits basic blocks, not necessarily in the order of execution.
- At each confluence point, the algorithm applies the meet operator to the data-flow values obtained so far. Some of these values used were introduced artificially in the initialization process, not representing the result of any execution from the beginning of the program.
Algorithm 1 (Essentially Kildall's Algorithm applied to a monotone framework)
. A particular instance of , where is a flow graph.
.
. Visit nodes other that in order $n_1, n_2, ... $ (with repetitions, and not fixed in advance). We node n by setting
where . The sequence has to satisfy the following condition:
If there exists a node such that after we have visited node in the sequence, then there exists integer such that . Also, if after visiting node , for all , then the sequence will eventually end.
For no loop CFG, the iterative algorithm only iterates once.
. Given instance of framework , if we apply Algorithm 1 to with the sequence , we say that the -th step of Algorithm 1 has been applied after we have visited node . Let be a node in . We let denote the value of right after step of Algorithm 1 has been applied.
. Given a particular instance of , we let denote , the function in which is associated with node . Let be a path in . Then we may use for . Note that is not in the composition.
Lemma 2. Given an instance of a monotone data flow analysis framework , if we apply Algorithm 1 to , the algorithm will eventually halt.
. It is a simple proof by induction on , the number of steps applied in Algorithm 1, that (by monotone property, ), for all nodes in . According to the condition on the sequence of nodes being visited, after we have applied the -th step of Algorithm 1, either there exists an integer such that for some node in G, or the sequence will halt. The facts that is bounded and that has only finitely many nodes guarantee that the sequence ends and the algorithm will eventually halt.
Theorem 2. Given an instance of framework , after we have applied Algorithm 1 to , we have , where .
. We want to prove by induction on that , where .
. is the only node that has a path from of zero length. Since is assigned initially and not changed afterward, the basic holds.
. If , we are done. Suppose . We have , and , by hypothesis. Thus by monotonicity and Observation 1. By monotonicity again, we have . By Observation 2, we have for all
Theorem 3. Given an instance of a monotone framework , after we have applied Algorithm 1, the solution 's we get is the maximum fixed point solution of the set of simultaneous equations
. It is obvious that, after Algorithm1 halts, the 's satisfy the above equations. Now suppose we are given any solution 's to the equation. We want to prove by induction on , the number of steps applied in Algorithm 1, that after the -th step for all .
. Obvious.
. At the -th step, we have
Corollary. Given an instance of a framework , as input to Algorithm 1, the 's we get after Algorithm 1 halts is unique independent of the sequence in which nodes are visited, provided the sequence satisfies the condition stated in the algorithm.
Theorem 4. Given a monotone framework , suppose , i.e. is not distributive. Then there exists an instance such that after we apply Algorithm 1, there is a node in such that
. By condition [M4] in the definition of a monotone function space, we can find acyclic graphs and , with input nodes and , and output nodes and , such that after that we apply Algorithm 1 to and , we get and . A straightforward induction on the number of meet operations and function applications necessary to construct a lattice element from proves the existence of and .
Consider the graph of the above figure. It is easy to check that if we apply Algorithm 1 we have . By Theorem 2, in we have and . Thus by monotonicity. But we are given , so .
In summary then, Kildall's algorithm applied to a monotone data flow analysis framework yields a unique solution, independent of the order in which nodes are visited. This solution is the maximum fixed point of the set of equations associated with a flow graph. However, we can only show that the solution is equal to or less than the solution, and when the framework is not distributive there will always be some instance in which the inequality is strict.
A Variant of Kildall's Algorithm
This algorithm will obtain the solution in certain situations where Algorithm 1 fails to do so. However, like Algorithm 1, it must fail for some instance of any monotone, non-distributive framework. It is interesting, however, to note that the two algorithms differ in their behavior in the general monotone case, although they are easily shown in produce identical answers for distributive frameworks.
Algorithm 2
. As in Algorithm 1
. Visit nodes other than in order (not fixed in advance). We node by setting
The sequence has to satisfy the condition: if there is a node such that after we have visited node in the sequence, then there exists integer such that . Also if for all , we will eventually halt the iteration.
. We set
Theorem 5. Given an instance of a framework as input to Algorithm 2:
Algorithm 2 will eventually halt. The result we get is unique independent of the order in which nodes are visited and .
The resulting 's form the maximum fixed point of the set of equations
If is the result of applying Algorithm 1 to , then .
. The proofs are similar to the proofs of the results in the previous section and we omit them.
Theorem 6. Given a monotone framework , suppose , i.e. is not distributive. Then:
there exists an instance such that there is a node in such that
after we have applied Algorithm 2 to .
there exists an instance such that there is a node in such that
after we have applied Algorithm 1 and Algorithm 2 to instance .
. We may, as in Theorem 4, invoke condition [M4] to observe that there are graphs and such that their output nodes and have and . Then consider the graph of the above figure. It follows from monotonicity that
To prove Theorem 6 (2), we refer to Figure 2. Direct calculation will show that . By part (3) of Theorem 5, and . Thus and . It follows that , so .
Undecidability of the MOP Problem for Monotone Data Flow Analysis Frameworks
We have seen that some of the obvious algorithms fail to compute the MOP solution for an arbitrary monotone framework. We shall now show that this situation must hold for any algorithm. In particular, we show that there does not exist an algorithm which, for arbitrary instance of an arbitrary monotone data flow analysis framework , will compute for all nodes of .
Definition. The is the following: Given arbitrary lists and , of strings each in , say
does there exist a sequence of integers such that
It is well known that MPCP is undecidable.
Example instance of MPCP. Consider the following two lists:
and
A solution to this problem would be the sequence , because
Furthermore, since is a solution, so are all of its "repetitions", such as , etc.; that is, when a solution exists, there are infinitely many solutions of this repetitive kind.
Association of Post Correspondence Problem with Monotone Data Flow Analysis Frameworks
To associate PCP with monotone data flow analysis frameworks, we can think of the problem in terms of data flow through a hypothetical program. Here's how:
- Control Flow Graph (CFG): Construct a CFG where each node represents a step in the PCP process (e.g., selecting an index from the lists).
- Data Flow Information: Define data flow information that represents the partial solutions to the PCP at each node. For example, at each node, we can track the current concatenation of strings from both lists.
- Monotone Propagation: Propagate this data flow information through the CFG. At each node, update the data flow information based on the selected index and the current concatenation.
- Termination Condition: Check if the data flow information at any node matches the target concatenation. If it does, a solution to the PCP has been found.
Given an instance of MPCP with lists and , we can construct a monotone data flow analysis framework , where the following elements are in :
- , the lattice zero,
- the special element , which will in a sense indicate non-solution to MPCP, and
- all strings of integers beginning with .
The meet on these elements in given by: whenever . Thus, if , then either or .
The set of functions includes the following.
the identify function on
functions , for defined by:
- if is a string of integers beginning with , then ,
- and
- ,
the function defined by
- for string ,
- ,
- ,
the function defined by (that is, the string consisting of alone) for all .
All functions constructed from the above by composition.
Lemma 3. constructed above is a monotone data flow analysis framework.
. We show only monotonicity; the other properties are easy to check. By Observation 1 and Lemma !, it suffices to show that if for and in then
- for ,
- , and
- .
Since , (3) is immediate. We have observed that for the meet operation we have defined, implies is either or . In the former case (1) and (2) are immediate.
In the latter case, and , so and follows.
Theorem 7. There does not exist an algorithm with the following properties.
- the input to is
- Algorithms to perform meet, equality testing and application of functions to lattice elements for some monotone framework and,
- An instance of the framework.
- The output of is the solution for I.
. If exists, then we can apply to any monotone framework constructed above with the instance . The MOP solution to that problem at point is easily seen to be if the instance of the MPCP has no solution, and if it does. Thus, if existed, we could solve the MPCP.
We can construct a CFG where each node represents selecting a index from MPCP lists, and data flow information represents the partial solutions to the MPCP at each node. If the data flow information obtained from different paths is different (which means there is no solution to this MPCP) , then we always obtain by definition of meet operator (meet on these elements in given by: whenever . Thus, if , then either or .). By definition of function , therefore we obtain , which will in a sense indicate non-solution to MPCP.
References
- Kam, J.B. and Jeffrey D. Ullman. Monotone Data Flow Analysis Frameworks, Tech. Rept. No. 169, Dept, of Elec. Engg., Princeton Univ., Princeton, NJ, 1975.
- Kildall, Gary A. A Unified Approach to Global Program Optimization, in [POPL73], pp. 194-206.
- Wikipedia - Post correspondence problem
- Muchnick and Steven S. Advanced compiler design and implementation.
- Aho, Alfred V. Compilers: principles, techniques, & tools.