ME290M, Spring 1999

ME290M
Expert Systems in Mechanical Engineering

Spring 1999, T-Th 12:30-2:00 pm
1165 Etcheverry Hall, Course Control No. 56369 http://best.me.berkeley.edu/~aagogino/me290m/s99


[ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat]


MANAGEMENT OF UNCERTAINTY WITH INFLUENCE DIAGRAMS

Alice M. Agogino
University of California at Berkeley
Working Paper #85-0703-6

1 INTRODUCTION

Human beings possess an unparalleled ability for processing complex types of information and for making inferences based on this knowledge. First generation expert systems try to capture human knowledge by means of logic predicates or If-Then statements that involve measurable and quantifiable parameters. Consider the following example from MYCIN, a well known rule-based medical diagnostic expert system (Buchanan and Shortliffe [1984:82]).

MYCIN RULE NO. 37:

IF:

1) The identity of the organism is not known with certainty, and

2) The stain of the organism is gamneg and

3) The morphology of the organism is rod, and

4) The aerobicity of the organism is aerobic

Then:

There is strongly suggestive evidence (0.8) that the class of the organism is enterobacteriaceae .

 

Following this rule, a conclusion about the class of organism present can be drawn (with a certain degree of confidence) from satisfaction of all of the antecedents. In the MYCIN expert system this uncertain degree of belief is represented in certainty factors. Positive certainty factors show support for the hypothesis "H" given the evidence "E1"; negative certainty factors show the degree of support against the hypothesis.

Fig. 1.1: Certainty Factor of Hypothesis "H" Given Evidence "E1"

If the evidence is also uncertain the degree of support for the hypothesis is expressed by the following expression, where E1 represents the uncertain evidence and CF(E1) the associated certainty factor for this evidence.

CF(H, E1) =max(0, CF(E1))• CF(H,E1) [equation 1.1]

The rules for combining evidence were introduced in Chapt. 9. The form below is equivalent to the use of separate factors for measures of belief and disbelief but requires less storage, where x=CF(H,E1) and y = CF(H,E2).

Fig. 1.2: Certainty Combination Function for Two Rules with Evidence E1 and E2 as Antecedents

In addition to explicit rules of inference like those in MYCIN, human beings also have unique skills in holistic reasoning, in the use of intuition, in deciding what parameters or variables are important to a problem, and in knowing what questions to ask when more information is needed. We are able to synthesize the interrelationships of complex problems and gain qualitative insights about them. If a computerized system is expected to capture any of this deeper human knowledge, it should at least be able to represent and process expert knowledge in both a symbolic and numeric fashion. As with the example from MYCIN, the inference mechanism and representation must be able to work with the uncertainties and imprecise data of both the material world and the human mind. The representation and management of uncertainty is a critical issue in the development of expert systems. The approach presented in this chapter is based on the intuitive graphical framework of influence diagrams coupled with the powerful Bayesian inference engine and automation algorithms supporting them.

1.1 Certainty Factors and Conditional Independence Assumptions

Rule-based expert systems using binary logic have trouble accounting for imperfections in the measuring instruments. Applications based on simple binary rules are prone to making false-positive and false-negative predictions. Spurious sensor readings and sensor degradation over time have been reported as major problems in rule-based systems ( Fox, 1983; Laffey et al., 1988; Rege & Agogino, 1986a).

Certainty factors have been used as a means to incorporate probabilistic knowledge in rule-based expert systems; the rules being weighted by a belief factor. In the medical diagnostic system MYCIN, the knowledge is stored in rules of the form "If Ei then Dj" where Ei represents evidence that supports disease hypothesis Dj. The certainty factor CF(Dj,Ei) associated with each rule ranges between -1 and 1. Positive certainty factors quantify relative (not absolute) support for the hypothesis and negative numbers quantify the relative support against the hypothesis.

Conditional independence is assumed in the combining rules that give the joint factor when multiple pieces of evidence support one disease hypothesis. Heckerman and Horvitz (1987) have shown that the MYCIN function for combining evidence is analogous to Bayes' formula assuming that each knowledge rule is conditionally independent (assuming a probabilistic interpretation of certainty factors). Thus the benefits of approaches using a simple weighting of rules must be compared against the loss in theoretical rigor and robustness (Cheeseman, 1985). All too often, joint and conditional dependencies are critical aspects of diagnostic reasoning. Probabilistic influences more often than not are highly interdependent .

There is growing support in the artificial intelligence research community for the use of influence diagrams (probabilistic influence diagrams are sometimes called Bayes' networks or belief networks) for representing uncertain knowledge in building expert systems (Heckerman & Horvitz, 1987; Henrion, 1987; Henrion & Cooley, 1987; Holtzman, 1985; Horvitz et al., 1986,88; Pearl, 1986). Influence diagrams provide an intuitive graphical framework for representing complex interdependencies and rules for combining evidence that are based on rigorous probability theory. The IDES (Influence Diagram Based Expert System) developed at the University of California at Berkeley (Agogino & Rege, 1987) is an implementation of influence diagram technology with efficient algorithms designed for real time applications.

1.2 Representing Conditional Independence with Influence Diagrams

Influence diagrams provide an attractive graphical scheme for explicitly codifying conditional independence between critical probabilistic variables as justified by the expert's knowledge and statistical data. The salient information in the diagram is, in fact, not which variables influence each other, but rather, which ones do not influence each other given the conditioning information. Thus influence is defined by its dual concept "lack of influence".

Definition 1: Consider two variables x and y. y does not influence x given C if Pr(x|y,C) = Pr(x|C) where C represents the state of information and other conditioning variables. The reverse implication is also true for all cases except the degenerate case where Pr(y|C) = 0.

In graph-theoretic terms, an influence diagram can be represented as a directed graph G = (V,A) where V is a set of n nodes and A is the set of directed arcs joining these nodes. Given n state variables x1, x2, . . . xn in the problem being modeled, there are n! expansions of the associated joint probability distribution. An influence diagram represents one expansion of the joint distribution which can be manipulated to realize other expansions through application of Bayesian probability theory. An expert may find one expansion more natural than others in codifying and conditioning subjective probabilities. Yet another expansion may be computationally more efficient in solving the diagram in order to answer a specific inference question. In manipulating the diagram during a solution sequence, any intermediate diagram must be "consistent" to the original in the following sense. (Interested readers are referred to Smith (1989) for a rigorous axiomatic development of this concept, using implication.)

Definition 2 : An influence diagram G' = (V',A') is said to be consistent with another influence diagram G = (V,A) if and only if, the set of explicit independencies IG' associated with G' can be derived from the set IG associated with G where V' is a subset of V (Rege & Agogino, 1986b,88).

Thus, for diagram G' to remain consistent with diagram G, there should not be any assertion about conditional independence between variables in G' which cannot be concluded from the information in the original diagram G. Consequently, the terms in the expansion represented by the new diagram should be computable from the terms in G. (Note, however, that it is possible for the information on conditional independence in G' to be a proper subset of that in G.)

Figure 1.3: Topologically Consistent World Views: (b) is consistent with (a)

(Rege & Agogino, 1988)

The concept of consistency provides us with an important tool to compare the information on conditional independence asserted by two experts, say 'A' and 'B', who provide different influence diagram models of the same problem domain. Consider the two influence diagrams shown in Fig. 1. The first, Fig. 1a, shows that expert A believes that the variable T influences the variables T1 and T2 which are conditionally independent of each other given T. Thus in the view of expert A the information flow, so to speak, is from T to T1 and T2. On the other hand expert B views the problem domain as in Fig. 1b implying that he is more comfortable with inferring the probability distribution for T given T1 and T2. Further, he asserts that in his view of the world T1 and T2 are not independent. If we treat the variables T1 and T2 as "observables" (e.g., readings of thermocouples for the same process) and T as a state variable that is not directly observable (the temperature of that process) we notice the cognitive differences in the thought processes of the two experts. Expert A sees the temperature as influencing the observable thermocouple readings T1 and T2; in other words, he knows the accuracy of each thermocouple given the process temperature. Expert B, on the other hand, feels more comfortable inferring the process temperature from the two thermocouple readings. In fact, he can make an inference about the reading on one thermocouple given the reading of the other (hence the arc from T2 to T1). According to Definition 2, expert B has a view topologically consistent with that of expert A, but his experiential knowledge may differ. Expert A might be a scientist who sees a causal mapping from process temperature to thermocouple readings (causal reasoning). Expert B may be a plant operator who has considerable experience in inferring the true process temperature from thermocouple readings (diagnostic reasoning).

Shachter and Heckerman (1987) note that a "causal" representation is often less complex in that it captures more information concerning conditional independence (as is the case in Fig. 1a), although this may be in the direction opposite of the intended usage. Kahnemann and Tversky (1985) also observe that the probability of consequence given cause (causal reasoning) is psychologically more readily available than the probability of cause given consequence (diagnostic reasoning). However, they caution that their experimental studies indicate that influences perceived as causal may exhibit a confidence bias unjustified by the data. The advantage of modeling with influence diagrams is that the model can be created in the direction that the expert has experienced the world, but can be transformed directly in order to answer a specific query. We go on to argue, that for real time applications, some of these transformations can be performed off-line so as to minimize the computational cost required in actual operation.

1.3 Mathematical Formalism of Influence Diagrams

Influence diagrams were conceived of and developed by researchers in the Decision Analysis Group at SRI International in order to automate the modeling of complex decision problems involving uncertain and incomplete information from a variety of sources (Miller, Merkhofer et al. [1976]). Knowledge of the interrelationships between variables is represented in a compact graphical and numerical framework which identifies the critical variables and explicitly reveals any conditional independence between them. Although the original objective in the development of influence diagrams was for computer-aided modeling, they have been used effectively in improving communication among people. Use of influence diagrams in participative modeling of complex decision problems is described in Owen [1978] and Howard and Matheson [1981]. A direct solution procedure to automate influence diagrams has been developed by Olmsted [1984] and formalized in algorithmic form by Shacter [1984a&b], Rege and Agogino [1986] and Gopalasamy and Agogino [1987]. The application of influence diagrams as a development tool for building expert systems was originally proposed by Agogino [1985] and Holtzman [1985].

An influence diagram is a knowledge representation that can be viewed at three hierarchical levels: (1) topological, (2) functional, and (3) numerical. At the topological level the nodes in the influence diagram represent the important variables in the system being modeled and the directed arcs identify conditional influences between them. From a graph-theoretic framework, the topology of an influence diagram can be described as an acyclic directed network. The nature of the influences between nodes in an influence diagram is specified at the functional level and further quantified at the numerical level. Formal calculi have been developed for deterministic functions and probabilistic relationships (based on either Bayesian or fuzzy probabilities). Only probabilistic influences will be described in this document. Readers are referred to Rege and Agogino [1986] for a formal description of fuzzy influences.

1.3.1 Topological Level

This is possibly the most powerful level of the diagram. It is at this level that the structure of complex problems can be extracted from domain experts. An influence diagram at the topological level is a graphical representation of the interrelationships between the variables involved in the problem under examination. It reveals visually the flow of information, influences, and the overall structure of the problem. (Note: Miller and Merkhofer et al. ,1976 use the word "relational" for the topological level. Both words will be used interchangably in this document.}

Three different kinds of nodes will be considered in building an influence diagram: 1) circular state nodes, 2) rectangular decision or control nodes and 3) diamond-shaped value nodes. Addition or deletion of nodes is analogous to similar operations on rules in rule-based expert systems but the effect of the same is much more apparent and intuitive with influence diagrams.

State Variables

Consider the following two-state variable influence diagram shown in Figure 1.4. The circular nodes represent two state variables, x and y, and the arc can be considered informally to signify that a conditional influence may exist. On the topological level, one interpretation of the influence diagram in Figure 1 is : "x may influence y". A variable x is said to influence a state variable y if given information about x tells you something about y. However, this should not be interpreted to mean that x causes y. No sense of causality is implied.

Figure 1.4: x Influences y

Conversely, if knowing x gives you no new information about state variable y, then variables x and y are said to be independent and no influence exists between them. This strong information about the interrelationship (or rather lack of) between variables x and y is signified by the absence of an arc between the corresponding nodes as illustrated in the influence diagram in Figure 1.5.

Figure 1.5: x is Independent of y

It should be noted that the lack of an arc is in some ways a stronger statement of our knowledge than the existence of an arc. The presence of an arc indicates that a possible dependency exists, while the lack of an arc states that no dependency exists. In other words, an arc can be thought of as representing our state of ignorance rather than our state of knowledge.

Decision Variables and Interpretation of Arcs

In addition to state variables, influence diagrams represent decision or control variables by rectangular nodes, following the same symbolism used in decision trees. The nature of an influence arc going into a state node and that going into a control node is conceptually and mathematically distinct. Arcs leading into state nodes represent conditional influences as discussed in the previous two paragraphs. Three examples of influence arcs into state nodes are shown in Figures 1.6ab&d. The influence can be deterministic, probabilistic or fuzzy. Arcs between state nodes can be reversed though legal topological transformations on the diagram according to Bayes' rule and providing a cycle is not introduced. Arcs going into decision or control nodes (Figure 1.6c&e) serve a different function and represent informational influences and show exactly which variables will be known by the decision maker or controller at the time that the decision is made. An arc from a decision node to a chance node indicates causality in the sense that the selection of one decision option over the other choices restricts, in general, the space or the universe of values the state variable can take on. Thus the meaning of an arc must be viewed relative to the types of nodes it connects.

Control nodes are the only nodes where the directions of the arcs are immutable aspects of the structure of an influence diagram and should not be reversed unless the intent is to reformulate the problem. Although not always shown explicitly, it is assumed that once a decision has been made, that decision is not forgotten. Hence the term no-forgetting arcs is used to signify the invisible but implicit arcs between decision nodes which must always appear sequentially in time (Figure 1.6e).

 

Figure 1.6: Six Interpretations of Arcs

Value Function

The diamond-shaped goal-value value node is used to signify the objective-goal of the decision maker or computer system user. There can be at most only one value-goal node in an influence diagram. For decision problems, the value node may be the cost (or profit) function of which the user hopes to minimize (or maximize). For an inference problem, the goal node is the object of a probabilistic query. However, as will be applied in the diagnostic and probabilistic inference examples, the location of the value node will vary depending on the question being asked of the expert system. Arcs into the single goal-value node signify which nodes (state or decision) directly influence the goal (Figure 1.6f).

Terminology and Definition of the Structural Matrix

In a more rigorous sense, but still at the relational or topological level, the influence diagram can be defined as an acyclic directed graph consisting of N variables represented by N nodes and a set of directed arcs between them. This topological information may be captured in a structural matrix that relates direct predecessors to each node in the diagram. A node "j" is called a direct predecessor of node "i" if and only if (iff) there is an influence arc directed from node "j" into node "i".

The NxN structural matrix A for an influence diagram of N nodes (each representing a distinct state or control variable) is a Boolean matrix of elements aij defined as follows:

A node "j" is called a direct successor of node "i" iff there is an influence arc directed from node "i" into node "j". The transpose of the structural matrix, AT, represents the Boolean matrix of direct successors in an influence diagram.

A directed path from node "i" to node "j" is the set of arcs connected head to tail that form a pathway between the nodes with the tail emanating from node "i" and head pointed to "j". A node "i" is a predecessor of node "j" if there is a directed path leading from node "i" to "j" . A node "i" is an indirect predecessor of node "j" if: (1) node "i" is a predecessor of "j" and (2) node "i" is not a direct predecessor of "j". Analogous definitions are given for successors and indirect successors. A node xi is a multi-path predecessor of node xj iff there exist more than one directed path from xi and xj.

The following terminology will be used to apply these concepts in this document. An influence diagram is defined as G=(V,A) where V is the set of N nodes and A the set of arcs joining these nodes:

V = set of N nodes, {xi: i=1,N}

A = set of arcs joining the nodes, {<xi,xj>: there is a directed arc from xi and xj}

P(xj} = the set of direct predecessors of xj, {xiŒ V:<xi,xj> ŒA}

Pt(xj) = the set of total predecessors (union of all indirect and direct predecessors) of xj. Similarly define S(xj) and St(xj) as the sets of direct and total successors, respectively.

P(B) = set of direct predecessors of all nodes in set B, i.e, the union of all P(xi) for all xiŒB.

Although this topological information may appear trivial, the existence or lack of influence and independence between variables is an important part of understanding any complex problem. Too often expert systems will assume independence at a certain level or between levels in hierarchical data structures, such as inference networks, to simplify the data storage and computational complexity. Various forms of conditional independence were assumed in PROSPECTOR (Duda et al. [1978] and Pednault et al. [1981]) and MYCIN (Shortliffe [1976] and Ishizuka et al. [1981]). The philosophy behind modeling with influence diagrams, is that this structural information is considered important domain knowledge of the problem, and should be captured along with more in-depth rules and numerical measures. This structural information is represented in a graphical fashion that experience has shown to be intuitive to many experts regardless of their level of mathematical proficiency. A sophisticated knowledge of the mathematical calculus behind influence diagrams is not needed to model at the topological level.

1.3.2 Functional Level

Suppose we have been successful in codifying expert domain knowledge graphically, and have an influence diagram at the topological level. How can we then use this topological knowledge stored in an influence diagram to assist or substitute human reasoning in a computerized expert system? In what follows, we shall outline the Bayesian or probabilistic approach to manipulating influence diagrams. It should be pointed out that influence diagrams on the topological level do not need a probabilistic basis to justify themselves. Rather probabilistic analysis is one approach to the deeper level interpretation of an influence diagram. Although not discussed in this paper, it is possible to incorporate other logics or inference techniques into the framework of the influence diagram (e.g., fuzzy logic and first-order predicate calculus) when appropriate.

Mathematical Definition of Influence on a State Variable at the Functional Level

The concept of influence pertains only to arcs leading into state variables (or into a state variable that for a particular query is designated as a goal node). The influence can be conditioned on either another state node or a control node. I will first define influence between state variables at the probabilistic functional level and later extend the definition to include control nodes. Consider N state variables x1, x2, . . . xN involved in the problem being modeled with influence diagrams. If the functional influences are probabilistic, the influence diagram represents one expansion of the joint probabilities of state variables denoted by Pr(x1,x2, . . . xN | H) where the inferential notation Pr(xi | xj, H) represents the probability distribution for xi conditioned on xj and the history or state of information "H". This joint distribution can be represented by any of N! expansions of conditional and marginal probability distributions, such as:

Pr(x1,x2, . . xN |H)=Pr(x1 |H) Pr(x2 | x1, H) Pr(x3 | x2, x1, H) . . Pr(xN | x1,x2, . . xN-1, H)

[equation 1.3]

There are N! such expansions of the joint probability distribution above (as illustrated in the next section for a three node problem), assuming no conditional independence between any of the variables. Influence diagram represent one expansion of the joint distribution of the system variables by means of a directed graph. Manipulations of the expansions at the mathematical level, map to corresponding transformations to the influence diagram at the topological level. Influence diagrams allow a separation of the graph or symbolic level of the system model from the mathematical or functional level.

Pr(x,y|H) = Pr(y|x,H)Pr(x|H) [equation 1.4]

This implies that we know both the unconditional or marginal probability distribution on x and the probability of y conditioned on x. Suppose, however, that our human expert or statistical data base can not directly produce the marginal probability distribution on x, but does have the conditional distribution of x given y. Suppose also that we can obtain the marginal distribution on y. Using the laws of conditional probability , we can still obtain the joint distribution on x and y using the following expansion:

Pr(x,y|H) = Pr(x|y,H) Pr(y|H) [equation 1.5]

This expansion is represented by the influence diagram shown in Figure 1.7. Although both Figures 1.4 and 1.7 are equivalent in the joint system they represent, they differ in their suitability for probability assessment.

Figure 1.7: y Influences x

Suppose now that x and y are independent in the sense that having information about x gives no additional information about y. Then the conditional probability of x given y, Pr(x|y,H), is equal to the marginal distribution on x, Pr(x|H). The expansion of the joint probability distribution, Pr(x,y|H), is then the product Pr(x|H)Pr(y|H) corresponding to the influence diagram shown previously in Figure 1.5. The lack of an arc between the state variables in Figure 1.5 graphically illustrates their probabilistic independence at the functional level as expressed by the equations, provided Pr(x|H) is not equal to 0):

 

Pr(y|x,H)=Pr(y|H)

[equation 1.6]

or equivalently,

   
 

Pr(x|y,H)=Pr(x|H)

[equation 1.7]

Acyclicity

Cycles are not permitted in influence diagrams. This is because a cyclic diagram does not represent any expansion of the joint distribution of the variables involved. On the functional level it would lead to an infinite loop in which information on a particular variable (say x) is needed to get information on x itself. In probabilistic terms, a cycle means there is no expansion of the joint probability distribution for the variables involved.

1.3.3 Numerical Level

The numerical form of the probability distributions can be of either discrete or continuous form. The numerical data base for any node, however, must be conditioned on all of the nodes with arcs directed into it.

2 CORRESPONDING PROBABILITY TREE REPRESENTATION

Consider an influence diagram with three uncertain variables, x, y, and z. There are 3! or six possible influence diagram representations.

Pr(x,y,z) =

Pr(z|x,y) Pr(y|x) Pr(x) [equation 2.1]

Pr(z|x,y) Pr(x|y) Pr(y) [equation 2.2]

Pr(y|x,z) Pr(z|x) Pr(x) [equation 2.3]

Pr(y|x,z) Pr(x|z) Pr(z) [equation 2.4]

Pr(x|y,z) Pr(z|y) Pr(y) [equation 2.5]

Pr(x|y,z) Pr(y|z) Pr(z) [equation 2.6]

Associated with each expansion is a different influence diagram, as shown in figure 2.1 on the following page. The first expansion (equation [2.1]) corresponds to the uppermost left diagram. The expansions in equations [2.2]-[2.6] are shown in the remaining five diagrams in Figure 2.1b-2.1f. The traditional approach using probability trees also yields six different tree configurations. The probability trees corresponding to the expansions in equations [2.1] and [2.6] are given in Figure 2.2a and Figure 2.2b respectively (the other four tree expansions are not shown). In representing equation [2.1] as a probability tree expansion, the first leftmost node must be the uncertain variable x, because it is the only one whose unconditional distribution is assumed to be known. Next, the conditional probability of y given x is added for each possible outcome of x. The third node of each branch corresponds to the conditional probability of z given both x and y. The joint probability distribution is obtained by multiplying the values along each branch of the tree.

Although a probability tree and an influence diagram may correspond to the same joint probability distribution, the influence diagram is much more compact and the conditional independence between variables is graphically obvious. If one influence diagram is inappropriate for probability encoding or if a different probabilistic inference is required, it is possible to manipulate the arcs in influence diagrams without violating the fundamental structure of the model. Topoplogical transformations on influence diagrams are described in the next section.

3 TOPOLOGICAL TRANSFORMATIONS

In the functional level description, we defined the probabilistic interpretation of the topology of influence diagrams. Now let us consider how Bayesian inference relates to topological transformations on an influence diagram.

3.1 Arc Addition

Recall that the existence of an arc signifies that an influence may exist. Therefore an arc can always be added between nodes as long as no cycles are introduced. A cycle implies that one can condition one's knowledge of a variable on that variable itself, resulting in an endless loop; a sort of probabilistic Kline bottle. The addition of an arc, however, results in a loss of (possibly vital) information regarding the conditional independence between the variables involved.

3.2 Arc Reversals

The framework for arc reversals was established in Section 1.2.1. We shall now study a heuristic justification of the rules for the same. Arc reversals in an influence diagram should be performed such that they are consistent with the probabilistic information the diagram represents at the numerical and functional levels. Any rules for arc reversals must be therefore justified in terms of the calculus of conditional probabilities. As an example, consider the arc reversal between nodes x and z in the influence diagrams in Fig. 3.1.

The expansion of the joint probability distribution represented in Figure 5a is

Pr(x,y,z ) = Pr(x ) Pr(y ) Pr(z | x, y) [equation 3.1]

As explained before, if we know the terms on the right side of [3.1], we can construct the joint distribution on the left. Consider now the influence diagram in Fig. 3.1b. The expansion in this case is:

Pr(x, y, z ) = Pr(y ) Pr(z | y) Pr(x | y, z) [equation 3.2]

Figure 2.1: Alternate Influence Diagram Expansions for Pr(x,y,z | H}

Figure 2.2: Alternate Probability Tree Expansions for Pr(x, y, z | H}

Figure 3.1: Examples of Arc Reversals

The problem facing us now is: Using only the terms on the right-hand side of [3.1], can we arrive at the terms in the right-hand side of [3.2]. Since Pr(y) is known, our problem is now reduced to expressing Pr(z | y) and Pr(x | y, z) in terms of the quantities in [3.1]. This can be done as follows ( Wx is the sample space for x):

 

Pr(z|y)

=

S Pr(z, x|y)

[equation 3.3]

 

Wx

 
 

=

S · (Pr(z|x, y) • Pr(x|y)

[equation 3.4]

 

Wx

 
 

=

S · ( Pr(z|x, y) • Pr(x))

[equation 3.5]

 

Wx

 

 

Using Bayes' rule:

Pr(x|y, z)

 

Pr(z|x, y) • Pr(x|y)

[equation 3.5]

 

=

_____________________

 
   

Pr(z|y)

 

 

 
   

Pr(z|x, y) • Pr(x)

 
 

=

_____________________

[equation 3.6]

   

Pr(z|y)

 

 

On the other hand, one might try to reverse the arc directly in Figure 3.1a to obtain the representation in Fig. 3.1c. Here the expansion is:

Pr(x, y, z) = Pr(y) • Pr(z|y) • Pr(x|z) [equation 3.8]

However, it would then not be possible to obtain the conditional probability Pr(x|z) from the quantities in [3.1] without violating the conditional independence shown in Figure 3.1a. Thus we see heuristically the consequence of the arc reversal is the addition of an arc from node y to node x. This brings us to the following rules of arc reversal.

RULE 1 (of arc reversals): An arc from state node "y" to state node "x" can be reversed iff "y" is not a multi-path predecessor of "x".

RULE 2 (of arc reversals): On reversal of an arc from y to x, node "y" inherits the direct predecessors of "x" and vice-versa.

The purpose of Rule 1 is to disallow any arc reversal that may cause a cycle to be introduced in the diagram. A node "y" is called a multi-path predecessor of node "x" if it has more than one path to y (e.g. see Fig. 3.1b). Clearly, the reversal of the arc from y to x without first reversing the arc from y to z in Figure 3.1b will cause a cycle, which as explained before, invalidates the diagram.

The example cited above for Rule 2 is of course not sufficient justification for the same. However, a little thought will show that the crux of obtaining the terms in the expansion represented by the new diagram is the joint distribution of the variables involved ( x and z in our example), conditioned on the set of nodes formed by the union of the sets of their direct predecessors. In other words, this set of nodes will always precede the variables involved in the reversal in the hierarchy of the expansion. Hence they will always occur as conditioned on the afore-mentioned set of predecessors. A formal proof for the same can be found in Olmsted [1984: Chapter 2] and Shacter [1984b:14-16].

Furthermore, it should be noted that arcs to and from decision nodes cannot be reversed. This arises from the informational nature of the arc to a decision node and the causal nature of the arcs from decision nodes to state nodes. Reversal of the former would violate chronological precedence and that of the latter, causality.

3.3 Preserving Conditional Independence

For an N node influence diagram and no conditional independence, there are N! possible expansions. Knowledge of the conditional and marginal probability distributions in any one of these expansions gives us the joint distribution which can then be used to obtain the other expansions. Symbolically, in the influence diagram format, this can be visualized as reversing the arcs in the original diagram to obtain the diagrams expressing the new expansion. However special care must be taken to insure that the conditional independence present in the original diagram is preserved in the transformed one. The order in which the arc reversals are performed is important in preserving this important structural knowledge. Let us review the example considered in the previous section concerning the probability expansion of the three-node problem given as equation [2.3].

Pr(x, y, z ) = Pr(y | x, z) Pr(z | x) Pr(x ) [equation 2.3]

This expansion can be simplified for the diagram shown in Figure 3.2 if we include the information that y is conditionally independent of x then Pr(y | x, z) = Pr(y | z) giving

Pr(x, y, z ) = Pr(y | z) Pr(z | x) Pr(x ) [equation 2.4]

 

Now suppose we agree that Figure 3.2 represents the structure of the problem, but we wish to encode the probabilities using the three-node expansion given in equation [2.5]:

Pr(x, y, z ) = Pr(x | y, z) Pr(z | y) Pr(y ) [equation 2.5]

Using the rules of topological transformation for influence diagrams we can reverse the arc from z to y if and only if we add an arc from x to y as shown in Figure 3.3a (from Rule 2, y inherits all of the direct predecessors of z).

We can now reverse the arc from x to z provided we reverse the arc from x to y (x inherits the direct predecessors of z, giving the influence diagram in Figure 3.3b. The resulting influence diagram represents the probability expansion in equation [2.5] as desired but does not preserve the original conditional independence between x an y shown in Figure 3.2. On a functional level, if Pr(y|x,z) = Pr(y|z) then application of Bayes' Rule gives Pr(x|y,z) = Pr(x|z). Thus information was lost in the topological transformations. However, if we had started with node x and had first reversed the arrow between z and x (Figure 3.4a), we could then reverse the arc between y and z without the addition of a new arc. The result is the influence diagram shown in Figure 3.4b with the conditional independence between x and y preserved. The corresponding equations for the latter arc reversal are shown as follows:

 

Pr(x,y,z)

=

Pr(y|z,x) Pr(z|x) Pr(x)

equation [2.5]

 

=

Pr(y|z) Pr(z|x) Pr(x)

conditional independence

 

=

Pr(y|z) Pr(z,x)

joint on x and z

 

=

Pr(y|z) Pr(x|z) Pr(z)

reverse arc between x and z

 

=

Pr(y,z) Pr(x|z)

joint on y and z

 

=

Pr(z|y) Pr(x|z) Pr(y)

conditional independence preserved

 

Figure 3.4: x is Conditionally Independent of y After Arc Reversals

3.4 Barren Node Removal

Although strictly speaking barren node removal need not form an integral part of the transformations to the influence diagram, we will refer to this operation since it is a convenient technique to visualise the flow of information graphically. A node is said to be barren if it has no direct successors. Furthermore, a node can become barren due to arc reversals. Such nodes (as a result of the topological transformations of the diagram) can be removed, and the diagram redrawn without changing the solution. Repeated application of this procedure leads to a shrinking influence diagram concluding in a representation of the solution to the inference or decision problem under consideration. Details of the procedures for solving influence diagrams will be given in the following section.

In expert system applications, barren node removal can be used as a mechanism to delete nodes and influences that are irrelevant to the specific question that is being asked of the system. The question and topological transformations needed to answer it, create what will be referred to as the current structure of the influence diagram.

3.5 Node Removal

Node removal at the topological level corresponds to summing over all possible states of the world for that state node. Although there are many topological transformations that one can derive for node removal, the following simple rule can be useful for many applications.

Node Removal Rule 1: Consider a node x which has P(x) as the set of direct predecessors and directly precedes only one node y (Figure 3.5a). The node can then be removed from the diagram by absorbing it into the node y. Node y then inherits all the direct predecessors of x (Figure 3.5b).

Figure 3.5: Node Removal Rule

The corresponding mathematics at the functional level for this rule of node removal is given below:

Pr(y | P(x), A)

=

S Pr(y , x | P(x), A, H)

 

Wx

 

=

S Pr(y | x, P(x), A) • Pr(x | P(x), A)

 

Wx

 

=

S Pr(y | x, A) • Pr(x | P(x))

 

Wx

4 SOLVING INFLUENCE DIAGRAMS

The "solution" of an influence diagram involves either 1) probabilistic inference or 2) decision making under uncertainty. In either case there must be some question to be answered as represented by the diamond-shaped value node.

4.1 Probabilistic Inference

P>Probabilistic inference is equivalent to determining the probability distribution Pr(G(X)|Y) where X and Y are sets of nodes given in the diagram. The function G(X) is analogous to a value node and is called the goal-value function for probabilistic inference. This new goal-value node G(X) is added to the diagram with conditioning arcs from the set X. Because an influence diagram can contain at most one value node at a time, all previous goal-value nodes are converted to state nodes or are deleted from the diagram. The procedure involves manipulating the influence diagram to obtain a representation of the problem under consideration. The relevant arcs are reversed until the resulting diagram includes the goal node and all conditioning nodes: the desired representation Pr(G(X)|Y). One such example is shown in Figure 4.1. Barren nodes may be removed along the way to obtain a "cleaner" diagram as shown in Figure 4.2.

4.2 Decision Analysis

The transformations or operations involved in decision analysis are the same as that for probabilistic inference; that is arc reversals and barren node removal. However the difference is that here instead of determining a probability distribution, we are computing the expected value (or utility) of a sequence of decisions, comparing them and assessing the optimal sequence by maximizing the expected value of the utility function represented by the value node. The topological or symbolic solution procedure is equivalent to ordering the nodes in the correct sequence(s) of integration of state nodes or optimization over decision nodes. As pointed out by Shacter [1984a:14] the steps involved at the numerical level are those used in evaluating a stochastic dynamic program (Bellman [1957]).

Note that the order so determined would correspond to one of the orderings that might be used in a decision tree based technique in which the variables in an expanded decision tree are either integrated or optimized over. The principle of dynamic programming allows pruning of suboptimal branches in this implicit decision tree. However, the influence diagram algorithm allows solution without the need of an intermediate tree representation. All intermediate calculations are performed directly from the influence diagram knowledge base and thus retain their original meaning.

4.3 Node Removal Concept

An equivalent way of looking at the above procedure is to consider node removal as a topological transformation that corresponds to manipulations of probabilistic functions. Nodes are removed in the order of integration/optimization described above. Rules can be formulated that allow one to develop a topological solution to the problem, without explicitly performing functional or numerical evaluations. If the topological transformations are to be useful, these rules need to have a basis in the probabilistic analysis of the problem. The rules described below were first proposed by Olmsted [1984] and then formalized in algorithmic form by Shacter [1984a:15-20].

Given an acyclic influence diagram the optimal sequence of decisions can be found by ordering or removing nodes from the diagram according to the following rules:

RULE 1 for State Node Removal: A state node "i" can be absorbed into the value node iff it directly precedes the value node and no other node. On its removal the value node inherits all the direct predecessors of "i". Absorption of the node is equivalent to integration over the corresponding state variable, i.e., conditional expectation.

RULE 2 for Decision Node Removal: A decision node "i' can be removed from the influence diagram iff it directly precedes the value node and directly succeeds all other direct predecessors of the value node. Removal of the node is then equivalent to maximization of the (conditioned) expected utility.

Therefore, combining these rules with appropriate arc reversals, the topological solution procedure can be determined. The justification of the rules can be found in the literature on decision making under uncertainty (see Raiffa [1970] or Bunn [1984]).

5 COMPUTER ASSEMBLY DIAGNOSTICS TUTORIAL

As an illustration of the use of influence diagrams, we present the Diagnostician's Inference Problem in the context of the automated assembly of microcomputers. In order to simplify the presentation, we will assume that during the final testing of a microcomputer, a system failure can be traced to failures in either of two components:

(1) the logic board or

(2) the I/O board.

Let us further assume that a sensor is available that gives rudimentary information about the operating status of the assembled microcomputer and is a function of the operating status of the logic board and the I/O board. The associated influence diagram is shown in Figure 5.1. This represents the Diagnostician's Problem on the topological level.

Figure 5.1: Probabilistic Influence Diagram

In the present structure of the influence diagram, the joint probability can be obtained from the conditional probability of the system status "S" and the marginal distributions on the state of the logic board "L"and the 1/0 Board "1/0" as given in the expansion below.

Pr(S, L, I/O) = Pr(S|L, I/O) • Pr(L) • Pr(I/O) [equation 5.1]

Note that L and I/O are conditionally independent (there is no arc between them) and thus their joint probability distribution is the product of each marginal distribution.

The nature of the influences is specified at the functional level. The influence diagram in Figure 5.1 implies that the conditional distribution on S is known along with the marginal distributions on L and I/O. Let us assume for illustration that the test for system status gives a deterministic result based on the status of the I/O and logic boards: if any of these boards (or both) is in a failure state the system status will show a failed system state. Using a subscript o f "0" for a failed state and "1" for the operational state, this implies on the numerical level that the conditional distribution of the system status, S is:

Let us further assume that the marginal probability of failure is 5% for the logic board and 1% for the I/O board, giving the marginal probability distributions:

 

Using the expansion in equation [5.1] the joint probability distribution associated with influence diagram 5.1 is

Note that the joint distribution in [5.5] can always be derived from the conditional and marginal distributions designated in the influence diagram and expressed numerically in equations [5.2-5.4]. Thus the joint distribution need not be stored but can be derived when necessary.

5.1 Probabilistic Reasoning

The diagram in Figure 5.1 represents the diagnostician's knowledge of the problem and system interrelationships. In diagnostic reasoning, however, we are more apt to want to know the cause of the failure given information about the system status. One question that might be asked is: What is likelihood that the logic board has failed given a system failure has occurred? In this case, the evidence E = S = S0 the event that there is a system failure. The joint probability distribution on L and I/O given the evidence can be obtained mathematically by use of conditional probability:

 

Pr(L,I/O,E)

Pr(L|I/O,E) • Pr(E|I/O) • Pr(I/O)

 

Pr(L,I/O|E) =

________________ =

__________________________

[equation 5.6]

Pr(E)

Pr(E)

Equation [5.6] is summed (or integrated for continuous probability distributions) over all possible states of the I/O board to find the conditional probability of the logic board status given a system failure, Pr(L|E=S0):

Pr(L|E)= S Pr(L,I/O|E) [equation 5.7]
  I/OŒWI/O  

where WI/O is the sample space for the possible states of the I/O board.

This is an example of probabilistic reasoning. On the topological level of the associated influence diagram, equation [5.6] corresponds to a sequence of arc reversals leading to the influence diagrams shown in Figure 5.2a. The logic board node, L, is overlayed with a diamond to signify that this is the goal of our probabilistic reasoning. If the evidence E is known with certainty the probabilistic node "S" is now deterministic and updated with the new information on the system status. Because S is no longer probabilistic, we are only interested in events that are conditioned on this new information, i.e., we want to reverse all arcs leading into S. In Figure 5.2a the arc between L and S is reversed and node L inherits all of the direct predecessors of S, which in this case is the I/O node. Next the arc between I/O and S is reversed in Figure 5.2b without the addition of any new arcs. Finally the I/O node is absorbed in Figure 5.2c by summing over the states of the I/O board according to Rule 1 for state node removal.

 

Figure 5.2: Solve for Pr(L|E)

In numerical terms, the topological transformation shown in Figure 5.2a can be considered a combination of two steps as shown in Figure 5.3b&c:

(1) Aggregation of the two nodes on each side of the arc to be reversed. Aggregation corresponds to calculating the joint probability distribution of the nodes conditioned on the union of the predecessors of both. In this case, aggregation means finding the joint probability distribution of S and L conditioned on I/O {Figure 5.3b}.

(2) Deaggregation of the joint distribution by conditioning it on the node that the new arc will be directed away from. In this case, the joint distribution should be conditioned on S given I/O {Figure 5.3c}. The denominator in the deaggregation step is obtained by summing (or integrating for continuous distributions) the joint distribution from the aggregation step over the outcome space of the node the new arc will be directed towards (in this case, L). Although shown graphically in Figure 5.3 as separate steps, all information in the original diagram must be stored until the arc reversal transformation is complete.

Figure 5.3: Temporary Aggregation in Arc Reversal

At the numerical level, the aggregation step corresponds to the following expansion using equations [5.2] and [5.3]:

Pr(L, S|I/O) = Pr(S|L, I/O) • Pr(L|I/O) = Pr(S|L, I/O) • Pr(L) [equation 5.8]

 

The deaggregation step involves conditioning this joint distribution on the conditional distribution on S.

 

   

Pr(L, S|I/O)

 

Pr(L| I/O, S)

=

_____________________

[equation 5.10]

   

Pr(S|I/O)

 

 

 

The numerator in equation [5.10] is the joint probability distribution in equation [5.9]. The denominator Pr(S|I/O) is obtained by "integrating" the joint probability Pr(L, S|I/O) from equation [5.9] over the state space of L.

  Pr(S|I/O) =

S Pr(L,S|I/O) [equation 5.11]
over all LŒWL  

Substituting in the numerical values from equations [5.9] and [5.12] into equation [5.10] gives the conditional distribution on L:

 

The influence diagram in Figure 5.2b represents an arc reversal between S and I/O. The new conditional probability Pr(I/O|S) is obtained from Pr(S|I/O) and Pr(I/O) as represented in Figure 5.2a. This can also be thought of as aggregating nodes I/O and S and then deaggregating by conditioning the joint distribution on S.

 

Pr(S|I/O) • Pr(I/O)

Pr(S, I/O)

 

Pr(I/O|S) =

________________ =

__________________________

[equation 5.14]

Pr(S)

Pr(S)

 

The denominator in equation [5.14] can be obtained by summing the joint distribution Pr(S, I/O) over all values of I/O.

Substitution into equation [5.14] gives:

Figure 5.2c represents the conditional probability distribution Pr(L|S) and the marginal distribution Pr(S).

  Pr(L|S) =

S Pr(L|I/O, S) • Pr(I/O|S) [equation 5.17]
 

for all LŒWI/O

 

On a topological level, summation or integration corresponds to absorption of nodes. In this example, after the arc reversals shown in Figure 5.2a&b are performed, the 1/0 node is absorbed leaving the influence diagram in Figure 5.2c.

Suppose that we are given that a system failure has occurred i.e., S=S0. Let us define this as event E= S=S0. Thus the likelihood that the logic board is the cause of the failure may be quantified by Pr(L=L0 |E= S=S0) From equations [5.10], [5.16], and [5.17]:

Pr(L=L0 | E= S=S0) = Pr( L0 |E, I/O0) • Pr( I/O0 | E) + Pr( L0 |E, I/O1) • Pr( I/O1 | E)

= (0.05) • (0.1681) + (1) • (0.8319)

= 0.8403

5.2 Decision Problem

Let us now look at a decision problem. The Diagnostician's Decision Problem is to decide which component should be tested for failure first and repaired if appropriate. In an automated assembly operation, this could mean deciding where the computer should be sent for "rework". If the diagnostician is wrong in his or her decision for rework, valuable time would have been wasted by the rework technician and hence in producing the final product. For purposes of illustration, let us assume that the costs of rework, which involves opening the computer and pulling out the appropriate board, testing, and repairing it, is a constant for each board. Let us define the relevant costs to be minimized as follows:

 

$DebugL = $150 for the logic board debugging

$DebugI/O   = $100 for the I/O board debugging

$RepairL   = $50 for the logic board repair after debugging

$RepairI/O = $100 for the I/O board repair after debugging

 

The influence diagram in Figure 5.4 has been modified to show these decision and value nodes. It has been assumed that a system failure has occurred and this information is known at the time that the rework decision is made.

Figure 5.4: Diagnostician's Problem

Recall that there is an implicit arc between all decision nodes and the value node. This "No-Forgetting Arc" has been added to the influence diagram in Figure 5.5.

Figure 5.5: Addition of "No Forgetting Arc" to Value Node

The diagnostician has two choices given that the system status shows a failure (E=S=S0):

(1) DL = Send the logic board to rework first and

(2) DI/O = Send the I/0 board to rework first

Each of the two decisions has three possible outcomes corresponding to the possible states of the boards given that a system failure has occurred S=S0: (1) L0I/01, (2) L1I/00, and (3) L0I/00. If the wrong board is sent to rework, then debug time has been wasted on that board and the other board must be sent to be debugged and reworked. It is also possible that the board sent to rework has failed but the other board has failed also and must be repaired. The two decisions, possible outcomes, and cost consequences are shown in the decision tree in Figure 5.6.

Figure 5.6: Decision Tree

From the influence diagram in Figure 5.5 and the decision tree in Figure 5.6, we see that the rework decision influences the value node. If this were not the case the decision would be irrelevant (a "worry" over which the decision maker has no control). Furthermore, the outcome of the rework decision also influences the cost. This is due to the fact that though we pick one of the two boards (say 1/0) for rework, it could well turn out that the other board (L in this case) has really failed. In such a case, the cost of debugging and reworking the second board would have to be added to the original expenditure. Given that a system failure has occurred, which board should be tested first?

5.3 Diagnostician's Problem Solved

The Boolean matrix of direct predecessors for the problem (structural matrix) is shown in Figure 5.7. Applying the rules for influence diagram manipulation, discussed in Section 4, we get the topological or symbolic solution as the following sequence of node removals: R, L, I/0, S, and D. The modified influence diagram in the above stages of node removal is shown in Fig. 5.8.

Let us now work out the problem on the numerical level, using the following probabilistic data given

Figure 5.7: Structural Boolean Matrix of Direct Predecessors

previously (equations 5.2, 5.3 and 5.4) and as represented by Figure 5.5 on the topological level:

If we assume that the debugging test is "perfect" and influenced only by the states of the logic board, I/O board, and rework decision as shown in Figure 5.5, the conditional probability distribution of the results "R" is the trivial deterministic function of the actual states of the logic and I/O boards assuming any initial debug decision is made:

The cost function (value node) is a function of the rework outcome "R" and the rework decision "D".

C(R, D) = $Debug (R, D) + $Repair (R) [equation 5.18]

 

Now that the influence diagram has been defined on the topological, functional and numerical level, let us proceed with the numerical solution. The topological solution shown in Figure 5.8 of the influence diagram corresponds to the following calculations on the numerical level.

Figure 5.8:Topological Solution of the Diagnostician's Decision Problem

Step (a) in Figure 5.8: Absorb Rework Results Node "R"

where the function "<•>" represents the expected value of its argument and WR is the outcome space on the results, R. Because the rework outcome "R" is a deterministic function of the decision "D" and the states of the I/0 and logic board, the conditional expected value of C(D, L,I/O) reduces to:

Step (b) Figure 5.8: Reverse Arc Between L and S

This step was performed in Section 5.1 with the resulting probability distributions given by equations [5.13] and [5.14].

Step (c) in Figure 5.8: Absorb L Node:

Step (d) in Figure 5.8: Reverse Arc Between S and I/O

This step was performed in Section 5.1 (Figure 5.2b) with the resulting probability distribution Pr(I/O | S) as given in equation [5.16].

Step (e) in Figure 5.8: Absorb the I/O Node

Step (f) in Figure 5.8: Absorb deterministic node S into C. Since we have assumed S0 has occurred anyway, there is no change.

Step (g) in Figure 5.8: Optimize over D

Which rework decision minimizes the expected cost? From equation [5.24] above, clearly the best decision under the circumstances is to rework the logic board first. Pick DL.

5.4 Value of Imperfect Information (Value of Testing)

A generalization of the Diagnostician's Problem is the addition of another test or sensor that gives more information about the state of the boards without having to go through expensive debugging procedures. Unfortunately the test is imperfect and the diagnostician must still make a decision under uncertainty after viewing the results of this imperfect test. To make matters worse, this test is not free and thus the costs of the additional test must be weighed against its benefits in reducing the uncertainty in the decision. The decrease in expected value due to the additional sensor is called the expected value of additional testing.. In other applications, the expected value of adding a new test is sometimes called the value of imperfect information or value of clairvoyance depending on the circumstances. A schematic representation of this problem is given in Figure 5.9 (the implicit "no-forgetting arcs are not shown). The influence diagram solution procedure remains the same, except the optimal result will now include two decisions: (1) DT=testing decision and (2) DR=rework decision.

Figure 5.9: Value of Additional Testing

(Note: There are implicit "no-forgetting" arcs between decision nodes and the value node)

6 AUTOMATION OF INFLUENCE DIAGRAMS

The IDES (Influence Diagram Based Expert System) is an expert system development tool based on the automation of influence diagram technology described previously. Because it is written in the compiled language C, IDES is transportable to a large number of mainframe and micro computer systems. The IDES system provides several levels of automation and inference from the graphical influence diagram representation: (1) Bayesian inference, (2) optimization, and (3) sensitivity analysis. Principles of dynamic programming and influence diagram logic allow efficient numerical optimization of the decision or control problem without ever having to consider intermediate representations such as fault trees, simulation models, or decision trees. An upper bound on the value of additional information or testing can be evaluated through use of IDES's unique features for simulation and sensitivity analysis. Details are following. Figure 6.1 shows a schematic outline of the various modules of IDES and their interaction with each other.

Figure 6.1: IDES Structure

6.1 IDES Architecture

The IDES (Influence Diagram based Expert System) architecture is highly modular in form. Constructed on the expert system paradigm requiring separation of control and data, the domain knowledge base is separated from the solution procedure. In addition, the control strategy for the solution is domain-independent. The knowledge base consists of the influence diagram at all three levels of specificity. The control strategy consists of algorithms that create a sequence of transformations in order to answer a user-directed probabilistic inference or control query. This separation of the knowledge base from the control strategy provides a flexibility in the architecture that allows the user to change the structure and the parameters of the model in real time.

The IDES architecture consists of separate modules for knowledge base development, user interaction, control, and co-ordination of the modules (Fig. 6). Inform is the module for developing the influence diagram representation of the model by interacting with the experts or modelers ([Moore:1987] and [Lambert:1987]). IDES Interface is the module which co-ordinates all of the modules and is the interface with the user. Topo and numer are the modules for control of the solution at the topological (or symbolic) and numerical levels, respectively. Simul is a simulation module for testing the efficiency of solution algorithms. In the schematic of the IDES architecture given in Fig. 6 control lines are solid and data lines are dotted.

6.2. Inform Interrogator

Encoding diagnostic knowledge as influence diagrams has specific advantages over rules in conventional expert systems even when certainty factors can be specified for each rule [Heckerman :1987]. In particular, certain classes of dependencies cannot be modelled in a natural or efficient manner unless strong forms of conditional independence are assumed. The IDES implementation explicitly allows the possibility of aberrations and failures of sensors themselves. Influence diagrams capture not only "heuristic" or "shallow" knowledge but also attempt to encode explicitly the structure of the physical model in the mind of the expert. Finally, the graphic schema used leads to a compact and easily understood representation.

Inform is the interactive mechanism of IDES which elicits information from the expert/user on the topological structure of the influence diagram which represents a model of the problem under consideration as well as the functional relationships and numerical probabilities associated with it. The present version deals with discrete probability distributions. The Inform interrogator constitutes an important part of the IDES inference mechanism and current effort is directed to make it more intelligent and user-friendly. Further and current work includes:

  • Knowledge Acquisition: Helping the expert to model the problem
  • Expert Consultant: Explaining the 'hows' and 'whys' of its inferences
  • Dealing with continuous distributions
  • Graphics interface

6.3 Topo: Topological Solution Module

IDES converts the influence diagram into what might be called a "reduced decision" tree with the redundant branches pruned during the interrogation process itself, thereby eliminating unnecessary computations. This module therefore involves a solution based on the topology alone w/o any numerical computations i.e. the solution is symbolic rather than numeric. The solution is equivalent to ordering the state and decision variables in a sequence same as that of the corresponding decision tree. Thus, once the order in which to integrate chance nodes and optimize over decision nodes is known, the actual computations are relatively straightforward.

6.4 Numer: Numerical Computation Module

This module uses information from the previous modules, i.e. the actual distributions obtained from the user and the symbolic solution obtained to perform a decision analysis based on Bayesian methods. The optimal decisions so obtained can be displayed on request.

6.5 Sensitivity Analysis

Once a problem has been modeled and solved for a given set of probability distributions and payoffs, IDES stores the structure as well as the solution procedure so that changes in the numerical values can be made easily to determine the sensitivity of the optimal decisions to certain variables. The interrogation and computation time are both reduced significantly, thus enhancing the prospects of using IDES for real-time situations such as flexible manufacturing and automation. The sensitivity analysis module can be used on an appropriately structured influence diagram to find the expected value of information or value of additional testing.

7 IDES INFERENCE ENGINE

Inference, in response to a specific query to the influence diagram, is performed by transformations on the diagram that reduce it to the topological structure represented by the query. Although many kinds of transformations are possible, the IDES (Influence Diagram Based Expert System) is based on the following two:

Node Removal (single arc corollary): A node x with direct predecessors Dx which precedes only one node y can be removed from an influence diagram by absorbing it into the node y. Node y then inherits all the direct predecessors of x. Node absorption is shown schematically in Figure 7.1, where Dy represents the set of direct predecessors of y, excluding node x. Note that the sets Dy and Dx may have node elements in common. The value of an observable sensor node is simply propagated into its successor and no summation is necessary for its absorption.

Figure 7.1. (a) Before and (b) After State Node Removal

Node absorption corresponds to summation of the joint probability over all possible states of the conditioning node, Wx:

Arc Reversal: An arc from state node x to state node y can be reversed, provided x does not have more than one path to y (so as not to cause a cycle). On reversal of the arc from x to y, node y inherits all the direct predecessors of node x and vice versa (Fig. 7.2). Arc reversal corresponds to application of Bayes' rule for conditional probability.

Figure 7.2. (a) Before and (b) After State Node Arc Reversal

Control Node Removal: If a control node directly precedes the value node and if all other conditioning predecessors of the value node are also informational predecessors of the control node, then the control node may be removed by maximizing the expected utility of the value node, conditioned on the values of the informational predecessors. The optimal control strategy is assumed for all subsequent operations on the diagram [Shachter:1987].

8. "GREEDY" PROBABILISTIC ALGORITHM

Input :

An influence diagram G = (V,A), the goal node xi and the conditioning nodes xJ ; i œ J

Output: An influence diagram G' consistent with G representing Pr ( xi | xJ )

Let J' = {i, J}, nodes to be retained; K' = {1,2, ..., n } \ J', nodes to be removed.

Let K be the "current" version of K'. Initially K = K'

1. Barren node removal :

Remove all nodes in K which have no successors - update the successor sets for nodes which directly preceded the barren nodes. If K is empty go to step 4.

2. State node removal :

Remove all nodes in K which have only one successor each. The removal is done in order of number of predecessors - with the node having the least number of predecessors removed first. Update successors, predecessors, etc. Repeat till all nodes in K with only 1 successor are exhausted. If K is empty go to step 4 else go to step 3.

3. Arc reversal :

Select the node in K with the least number of successors (i.e., least out-degree). Let this node be x.

* Pick a successor of x, check if there are more than one paths from x to that successor. If yes then reject it, else say y is that successor. Reverse the arc <x,y> with concomitant updating of predecessor, successor sets etc. Check if x has only one successor - if so remove x and if K is not empty go to step 2. If K is empty go to step 4. Otherwise go back to *.

4. Arc reversal from goal node :

For every successor of xi, check if there are more than one paths from xi to that successor. If yes then reject that successor, else keep track of successor which has the least number of predecessors. (say y). Reverse <xi,y> with concomitant updating of successor and predecessor sets. Repeat step 4 till xi has no successors.

9. IDES Greedy Decision Algorithm (Topological Solution of Controller's Problem)

Input: An influence diagram G = (V,A), the value node xn, the sequentially ordered control/decision nodes dJ ; v œ J, state nodes of known values xk ; j, K and relevant state nodes of unknown values xU; k, j, U.

Output: An influence diagram with a single node, the value node xn, and an array D of dJ*, the optimal decision sequence.

Let M' = {1,2, ..., n } \ n be the nodes to be removed (all nodes but the value node for a decision problem). Let M be the "current" version of M'. Initially M = M' .

1. Barren node removal :

Remove all nodes in M which have no successors (barren nodes). Update the successor sets for nodes which directly preceded the barren nodes. If M is empty, then end, else go to step 2.

2. Known sensor node reduction:

Propagate values of known sensor nodes in K to their non-decision node successors and remove all successor arcs. Remove the known sensor nodes with no predecessors. Eliminate any disjointed nodes that are made barren, by identifying the nodes in the graph that are unconnected to the value node. If M is empty, then end, else go to step 3.

3. State node removal :

Remove all state nodes in M which have only one non-decision node successor each. The removal is done in order of number of predecessors - with the node having the least number of predecessors removed first. Update successors, predecessors etc. Repeat till all nodes in M with only one non-decision node successor are exhausted. If M is empty, then end, else go to step 4.

4. Control node removal:

Remove the control/decision node in J which directly precedes the value node if all other conditioning predecessors of the value node are also informational predecessors of the control node. Remove all informational arcs to the absorbed control node and remove any of these nodes that are consequently disjointed from the graph and made barren, by identifying the unconnected elements of the graph. If M is empty, then end, else if any control node is removed, then go to step 3, else go to step 5.

5. Arc reversal :

Select the state node in M with the least number of non-decision node successors. Let this node be x.

* Pick a state node successor of x (excluding xn), check if there are more than one path from x to that successor. If yes then reject it, else say y is that successor. Reverse the arc (x->y) with concomitant updating of predecessor, successor sets etc. Check if x has only one non-decision node successor - if so go to step 2. Otherwise go back to *.

10 POTENTIAL APPLICATIONS

As illustrated in the Diagnostician's Problem, influence diagrams can be applied to expert systems requiring both reasoning under uncertainty (probabilistic inference) and advising (controlling at the machine level). New knowledge can be added or subtracted from the data base without adjusting the basic Bayesian control structure. The sensitivity analysis module can be used to add new evidence and test failure hypotheses in real-time diagnostic applications in automated manufacturing environments.

11. SUMMARY OF FUTURE RESEARCH

  • Parallel processing
  • Real-time supervisory control
  • Sensor fusion, diagnostics and monitoring
  • Learning systems with sensor input
  • Add transformations for fuzzy nodes and fuzzy logic, alternate belief functions, and deterministic predicates.

12. REFERENCES

Agogino, A.M. (1988) ed., Research Summaries of the Berkeley Expert Systems Technology Laboratory, Summer 1988, Dept. of Mechanical Engineering, University of California, Berkeley, CA 94720.

Agogino, A. M., S. Srinivas and K. Schneider (1988a) Mechanical Systems and Signal Processing, 2 (2), 165-185. Multiple sensor expert system for diagnostic reasoning, monitoring and control of mechanical systems.

Agogino, A. M., R. Guha and S. Russell (1988b) Artificial Intelligence in Engineering: Diagnostics and Learning, Computational Mechanics Publications, Southhampton, 333-357. Sensor fusion using influence diagrams and reasoning by analogy: Application to milling machine monitoring and control.

Agogino, A. M. and A. Rege (1987) Mathematical Modelling 8, 227-233. IDES: Influence Diagram based Expert System.

Altintas, Y., I. Yellowsley and J. Tlusty (1985) Sensors and Controls in Manufacturing, ASME-PED 18, 41-48. The detection of tool breakage in milling. (Note : The name of Y. Altintas was mis-spelled as Y. Attanis in the publication.)

Braun, S., J. Rotberg and E. Lentz (1987) Mechanical Systems and Signal Processing 1(2), 185-196. Signal processing for single tooth milling monitoring.

Bronstein, M. (1987) (Berkeley Expert Systems Technology Lab., Dept. of Mechanical Engineering, University of California, Berkeley.) Documentation for IDES: Influence Diagram based Expert System.

Cheeseman, P. (1985) Proceedings of the Ninth International Joint Conference on Artificial Intelligence, AAAI 2, 1002-1009. In defense of probability.

Diei, E. N. and D. A. Dornfeld (1985a) Sensors and Controls in Manufacturing, ASME-PED 18, 75-84. Acoustic emission from the face milling process - the effect of process variables.

Diei, E. N. and D. A. Dornfeld (1985b) Sensors and Controls in Manufacturing, ASME-PED 18, 33-39. A model of tool fracture generated acoustic emission during machining.

Dornfeld, D. A. (1984) Proceedings of the Symposium on Sensor Technology on Untended Manufacturing, SME. The role of acoustic emission in manufacturing process monitoring.

Dornfeld, D. A. (1985) Proceedings 12th NSF Conference on Production Research and Technology, SME. Acoustic emission monitoring and analysis of manufacturing processes.

Fox, M. S. (1983) Proceedings of the Eighth International Joint Conference on Artificial Intelligence, AAAI 1, 158-163. Techniques for sensor-based diagnosis.

Heckerman, D. E. and E. J. Horvitz (1987) Proceedings of the Sixth National Conference on Artificial Intelligence, AAAI 1, 121-126. On the expressiveness of rule-based systems for reasoning with uncertainty.

Henrion, M. (1987a) Proceedings of the Third Workshop on Uncertainty in Artificial Intelligence, AAAI 132-139. Practical issues in constructing a Bayes' belief network.

Henrion, M. and D. R. Cooley (1987b) Proceedings of the Sixth National Conference on Artificial Intelligence, AAAI 2, 471-476. An experimental comparison of knowledge engineering for expert systems and for decision analysis.

Holtzman, S. (1985) Ph.D. Dissertation, Engineering-Economic Systems, Stanford University. (Reprinted by Strategic Decisions Group, Menlo Park, California.) Intelligent decision systems.

Horvitz, E. J., D. E. Heckerman and C. P. Langlotz (1986) Proceedings of the Fifth National Conference on Artificial Intelligence, AAAI 1, 210-214. A framework for comparing alternative formalisms for plausible reasoning.

Horvitz, E. J., (1987) Proceedings of the Third AAAI Workshop on Uncertainty in Artificial Intelligence, 412. Reasoning about beliefs and actions under computational resource constraints.

Horvitz, E. J., J. S. Breese and M. Henrion (1988) (in press) International Journal of Approximate Reasoning. Decision theory in expert systems and artificial intelligence.

Howard, R. A. and J. E. Matheson (1984) The Principles and Applications of Decision Analysis, 2, Strategic Decisions Group, Menlo Park, California. Influence diagrams.

Kao, S., Laffey, T., Schmidt, J., Read, J. and L. Dunham (1987) Proceedings of ESIG - Third Annual Expert Systems in Government Conference, Washington, D.C., Oct 19-23. Real-time analysis of telemetry data.

Laffey, T. J., P.A. Cox, J. L. Schmidt, S. M. Kao, and Y. Read (1988) AI Magazine, 9 (1), 27-45. Real-time knowledge-based systems.

Lambert, M. and A. M. Agogino (1987) Proc. of the Second International Joint Conference on Human-Computer Interaction, 324. A graphical interface to an Influence Diagram based Expert System.

Lan, M.S. and Y. Naerheim (1985) Sensors and Controls in Manufacturing, ASME-PED 18, 49-56. In process detection of tool breakage in milling.

Miller, A. C., M. M. Merkhofer, R. A. Howard, J. E. Matheson and T. R. Rice (1976) Final Technical Report, DARPA # 2742, SRI International, Menlo Park, California. Development of automated aids for decision analysis.

Moore, E. A. and A. M. Agogino (1987) International Journal of Man-Machine Studies 26 (2), 213-230. INFORM: An architecture for expert-directed knowledge acquisition. (Also in Knowledge Acquisition Tools for Expert Systems, 2, 227-244, Academic Press, 1988.)

Nii, H. P., E. A. Feigenbaum, J. J. Anton and A. J. Rockmore (1982) AI Magazine, 23-35. Signal-to-symbol transformation: HASP/SIAP case study.

Olmsted, S. M. (1984) Ph.D. Dissertation, Engineering-Economic Systems Department, Stanford University. On representing and solving decision problems.

Pearl, J. (1986) Proceedings of the Fifth National Conference on Artificial Intelligence, AAAI 1, 339-343. On the logic of probabilistic dependencies.

Ramamurthi, K. (1988a) (Berkeley Expert Systems Technology Lab., Dept. of Mechanical Engineering, University of California, Berkeley.) User's manual for GraphIDES: Graphical Influence Diagram based Expert System.

Ramamurthi, K. and A. M. Agogino (1988b) Proceedings of the 1988 ASME International Computers in Engineering Conference, 2, 333-340. Real time expert system for fault tolerant supervisory control.

Rege, A. and A. M. Agogino (1986a) Knowledge-based Expert Systems for Manufacturing, ASME-PED 24, 67-83. Sensor-integrated expert system for manufacturing and process diagnostics.

Rege, A. and A. M. Agogino (1986b) Proceedings of the ICS-86, International Computer Symposium, IEEE 3, 1685-1691. Representing and solving probabilistic inference problems in expert systems.

Rege, A. and A. M. Agogino (1988) IEEE Transactions on Systems, Man and Cybernetics, 18 (3). Topological framework for representing and solving probabilistic inference problems in expert systems.

Russell, S. J. (1986) Ph.D. dissertation, Computer Science Department, Stanford University. Analogical and inductive reasoning.

Russell, S., S. Srinivas and A. M. Agogino (1988) Working Paper 88-0201-0, (Berkeley Expert Systems Technology Lab., Dept. of Mechanical Engineering, University of California, Berkeley, CA.) Inducing influence diagrams from examples.

Shachter, R. D. (1985) Proceedings of the AAAI Workshop on Uncertainty and Probability in Artificial Intelligence, 237-244. Intelligent probabilistic inference.

Shachter, R. D. (1986) Operations Research, 34 (6), 871-882. Evaluating influence diagrams.

Shachter, R.D. and D. E. Heckerman (1987) Artificial Intelligence Magazine, Fall. A backwards view of assessment.

Smith, J. (1989 - to appear) Annals of Statistics. Influence diagrams for statistical modelling.

Stein, J. L. and K. C. Shin (1985) Sensors and Controls for Automated Manufacturing and Robotics, ASME-PED 18, 57-66. Current monitoring of field controlled DC spindle drives.

Stein, J. L., D. Colvin, G. Clever and C. H. Wang (1984) Sensors and Controls for Automated Manufacturing and Robotics, ASME-PED 18, 45-63. Current monitoring on DC servo machine tool feed drives.

Tlusty, J. and G. C. Andrews (1983) Annals of the CIRP 32 (2 ), 563-572. A critical review of sensors for unmanned machining.

Wright, P. K. and D. A. Bourne (1988) Manufacturing Intelligence, Reading, Massachusetts: Addison-Wesley Publishing Co.


[ Home | Info | Syllabus | Readings | Students | Homework | Resources | News | Chat]

Last updated: 14 March 99
Send Comments to: Alice Agogino, aagogino@me.berkeley.edu
Copyright © 1999 Alice Agogino; All Rights Reserved.