Sequential Diagnosis by Abstraction

When a system behaves abnormally, sequential diagnosis takes a sequence of measurements of the system until the faults causing the abnormality are identified, and the goal is to reduce the diagnostic cost, defined here as the number of measurements. To propose measurement points, previous work employs a heuristic based on reducing the entropy over a computed set of diagnoses. This approach generally has good performance in terms of diagnostic cost, but can fail to diagnose large systems when the set of diagnoses is too large. Focusing on a smaller set of probable diagnoses scales the approach but generally leads to increased average diagnostic costs. In this paper, we propose a new diagnostic framework employing four new techniques, which scales to much larger systems with good performance in terms of diagnostic cost. First, we propose a new heuristic for measurement point selection that can be computed efficiently, without requiring the set of diagnoses, once the system is modeled as a Bayesian network and compiled into a logical form known as d-DNNF. Second, we extend hierarchical diagnosis, a technique based on system abstraction from our previous work, to handle probabilities so that it can be applied to sequential diagnosis to allow larger systems to be diagnosed. Third, for the largest systems where even hierarchical diagnosis fails, we propose a novel method that converts the system into one that has a smaller abstraction and whose diagnoses form a superset of those of the original system; the new system can then be diagnosed and the result mapped back to the original system. Finally, we propose a novel cost estimation function which can be used to choose an abstraction of the system that is more likely to provide optimal average cost. Experiments with ISCAS-85 benchmark circuits indicate that our approach scales to all circuits in the suite except one that has a flat structure not susceptible to useful abstraction.


Introduction
When a system behaves abnormally, the task of diagnosis is to identify the reasons for the abnormality.For example, in the combinational circuit in Figure 1, given the inputs P ∧ Q ∧ ¬R, the output V should be 0, but is actually 1 due to the faults at gates J and B. Given a system comprising a set of components, and a knowledge base modeling the behavior of the system, along with the (abnormal) observed values of some system variables, a (consistency-based) diagnosis is a set of components whose failure (assuming the other components to be healthy) together with the observation is logically consistent with the system model.In our example, {V }, {K}, {A}, and {J, B} are some of the diagnoses given the observation.In general, the number of diagnoses can be exponential in the number of system components, and only one of them will correspond to the set of actual faults.
In this paper, therefore, we consider the problem of sequential diagnosis (de Kleer & Williams, 1987), where a sequence of measurements of system variables is taken until the actual faults are identified.The goal is to reduce the diagnostic cost, defined here as the number of measurements.To propose measurement points, the state-of-the-art gde (general diagnosis engine) framework (de Kleer & Williams, 1987;de Kleer, Raiman, & Shirley, 1992;de Kleer, 2006) considers a heuristic based on reducing the entropy over a set of computed diagnoses.This approach generally has good performance in terms of diagnostic cost, but can fail to diagnose large systems when the set of diagnoses is too large (de Kleer & Williams, 1987;de Kleer et al., 1992;de Kleer, 2006).Focusing on a smaller set of probable diagnoses scales the approach but generally leads to increased average diagnostic costs (de Kleer, 1992).
We propose a new diagnostic framework employing four new techniques, which scales to much larger systems with good performance in terms of diagnostic cost.First, we propose a new heuristic that does not require computing the entropy of diagnoses.Instead we consider the entropies of the system variables to be measured as well as the posterior probabilities of component failures.The idea is to select a component that has the highest posterior probability of failure (Heckerman, Breese, & Rommelse, 1995) and from the variables of that component, measure the one that has the highest entropy.To compute probabilities, we exploit system structure so that a joint probability distribution over the faults and system variables is represented compactly as a Bayesian network (Pearl, 1988), which is then compiled into deterministic decomposable negation normal form (d-DNNF) (Darwiche, 2001;Darwiche & Marquis, 2002).d-DNNF is a logical form that can exploit the structure present in many systems to achieve compactness and be used to compute probabilistic queries efficiently.Specifically, all the required posterior probabilities can be exactly computed by evaluating and differentiating the d-DNNF in time linear in the d-DNNF size (Darwiche, 2003).
Second, we extend hierarchical diagnosis, a technique from our previous work (Siddiqi & Huang, 2007), to handle probabilities so that it can be applied to sequential diagnosis to allow larger systems to be diagnosed.Specifically, self-contained subsystems, called cones, are treated as single components and diagnosed only if they are found to be faulty in the top-level diagnosis.This significantly reduces the number of system components, allowing larger systems to be compiled and diagnosed.For example, the subcircuit in the dotted box in Figure 1 is a cone (with A as output and {P, D} as inputs) which contains a fault.First, cone A, as a whole, is determined as faulty.It is only then that A is compiled separately and diagnosed.In previous work (Siddiqi & Huang, 2007) we only dealt with the task of computing diagnoses, which did not involve measurements or probabilities; in the present paper, we present several extensions that allow the technique to carry over to sequential diagnosis.
Third, when the abstraction of a system is still too large to be compiled and diagnosed, we use a novel structure based technique called cloning, which systematically modifies the structure of a given system C to obtain a new system C that has a smaller abstraction and whose diagnoses form a super-set of those of the original system; the new system can then be diagnosed and the result mapped back to the original system.The idea is to select a system component G that is not part of a cone and hence cannot be abstracted away in hierarchical diagnosis, create one or more clones of G, and distribute G's parents (from a graph point of view) among the clones, in such a way that G and its clones now become parts of cones and disappear from the abstraction.Repeated applications of this operation can allow an otherwise unmanageable system to have a small enough abstraction for diagnosis to succeed.
Finally, we propose a novel cost estimation function that can predict the expected diagnostic cost when a given abstraction of the system is used for diagnosis.Our aim is to find an abstraction of the system that is more likely to give optimal average cost.For this purpose, we use this function on various abstractions of the system where different abstractions are obtained by destroying different cones in the system (by "destroying a cone" we mean to overlook the fact that it is a cone and include all its components in the abstraction).The abstraction with the lowest predicted cost can then be used for the actual diagnosis.
Experiments on ISCAS-85 benchmark circuits (Brglez & Fujiwara, 1985) indicate that we can solve for the first time nontrivial multiple-fault diagnostic cases on all the benchmarks, with good diagnostic costs, except one circuit that has a flat structure not susceptible to useful abstraction, and the new cost estimation function can often accurately predict the abstraction which is more likely to give optimal average cost.

Background and Previous Work
Suppose that the system to be diagnosed is formally modeled by a joint probability distribution P r(X ∪ H) over a set of variables partitioned into X and H. Variables X are those whose values can be either observed or measured, and variables H are the health variables, one for each component describing its health mode.The joint probability distribution P r(X ∪ H) defines a set of system states.
Diagnosis starts in the initial (belief) state where values x o of some variables X o ⊆ X (we are using boldface uppercase letters to mean both sets and vectors) are given by the observation, and we wish to reach a goal state after measuring the values x m of some variables X m ⊆ X\X o , |X m | = n, one at a time, such that (the boldface 0 and 1 denote vectors of 0's and 1's): That is, in a goal state a set of components H f are known to be faulty with certainty and no logical inconsistency arises if all other components are assumed to be healthy.Other types of goal conditions are possible.For example, if the health states of all components are to be determined with certainty, the condition will be that P r(H = 0 | X o = x o , X m = x m ) is 0 or 1 for all H ∈ H (such goals are only possible to reach if strong fault models are given, where strong fault models are explicit descriptions of abnormal behavior, as opposed to weak fault models where only the normal behavior is known).
Two special cases are worth mentioning: (1) If the initial state I 0 satisfies the goal condition with H f = ∅ then the observation is normal and no diagnosis is required.(2) If the initial state I 0 satisfies the goal condition with some H f = ∅, then the observation is abnormal but the diagnosis is already completed (assuming that we are able to check probabilities as necessary); in other words, a sequence of length 0 solves the problem.
Following de Kleer and Williams (1987) we assume that all measurements have unit cost.Hence the objective is to reach a goal state in the fewest measurements possible.
The classical gde framework, on receiving an abnormal observation X o = x o , considers the Shannon's entropy of the probability distribution over a set of computed diagnoses, which is either the set of minimum-cardinality diagnoses or a set of probable/leading diagnoses.It proposes to measure a variable X whose value will reduce that entropy the most, on average.The idea is that the probability distribution over the diagnoses reflects the uncertainty over the actual faults, and the entropy captures the amount of this uncertainty.After a measurement is taken the entropy is updated by updating the posterior probabilities of the diagnoses, potentially reducing some of them to 0.
The results reported by de Kleer et al. (1992) involving single-fault cases for ISCAS-85 circuits indicate that this method leads to measurement costs close to those of optimal policies.However, a major drawback is that it can be impractical when the number of diagnoses is large (e.g., the set of minimum-cardinality diagnoses can be exponentially large).Focusing on a smaller set of probable diagnoses scales the approach but can increase the likelihood of irrelevant measurements and generally leads to increased average diagnostic costs (de Kleer, 1992).
From here on, we shall use combinational circuits as an example of the type of systems we wish to diagnose.Our approach, however, applies as well to other types of systems as P θ P okJ θ okJ 1 0.5 1 0.9 0 0.5 0 0.1 Figure 2: Bayesian network for the circuit in Figure 1 (left).CPTs for nodes P , J, and okJ (right).
long as a probabilistic model is given that defines the behavior of the system.In Sections 4 and 5 we will present the new techniques we have introduced to significantly enhance the scalability of sequential diagnosis.We start, however, by presenting in the following section the system modeling and compilation method that underlies our new diagnostic system.

System Modeling and Compilation
In order to define a joint probability distribution P r(X ∪ H) over the system behavior, we first assume that the prior probability of failure P r(H = 0) is given for each component H ∈ H as part of the input to the diagnosis task (de Kleer & Williams, 1987).For example, the small table with two entries on the top-right of Figure 2 gives the prior probability of failure for gate J as 0.1.

Conditional Probability Tables
Prior fault probabilities alone do not define the joint probability distribution P r(X ∪ H).
In addition, we need to specify for each component how its output is related to its inputs and health mode.A conditional probability table (CPT) for each component does this job.
The CPT shown on the bottom (right) of Figure 2, for example, defines the behavior of gate J: Each entry gives the probability of its output (J) being a particular value given the value of its input (P ) and the value of its health variable (okJ).In case okJ = 1, the probabilities are always 0 or 1 as the behavior of a healthy gate is deterministic.The case of okJ = 0 defines the fault model of the gate, which is also part of the input to the diagnosis task.In our example, we assume that both output values have probability 0.5 when the gate is broken.For simplicity we assume that all gates have two health modes (i.e., each health variable is binary); the encoding and compilation to be described later, however, allows an arbitrary number of health modes.
Given these tables, the joint probability distribution over the circuit behavior can be obtained by realizing that the gates of a circuit satisfy an independence property, known as the Markov property: Given its inputs and health mode, the output of a gate is independent of any wire which is not a descendant of the gate (a wire X is a descendant of a gate Y if X can be reached following a path from Y to an output of the circuit in the direction towards the circuit outputs).This means that the circuit can be effectively treated as a Bayesian network in the straightforward way, by having a node for each wire and each health variable, and having an edge going from each input of a gate to its output, and also from the health variable of a gate to its output.Figure 2 shows the result of this translation for the circuit in Figure 1.
The joint probability distribution encoded in the Bayesian network provides the basis for computing any posterior probabilities that we may need when proposing measurement points (by the chain rule).However, it does not provide an efficient way of doing so.Specifically, computing a posterior P r(X = x | Y = y) given the values y of all the variables Y with known values involves summing out all variables other than X and Y, which has a complexity exponential in the number of such variables if done naively.

Propositional Modeling
It is known that a Bayesian network can be encoded into a logical formula and compiled into d-DNNF, which, if successful, allows posterior probabilities of all variables to be computed efficiently (Darwiche, 2003).For the purposes of sequential diagnosis, we encode the Bayesian network as follows.
Consider the subcircuit in the dotted box in Figure 1 as an example, which can be modeled as the following formula: Specifically, each signal of the circuit translates into a propositional variable (A, D, P , J), and for each gate, an extra variable is introduced to model its health (okA, okJ).The formula is such that when all health variables are true the remaining variables are constrained to model the functionality of the gates.In general, for each component X, we have okX → NormalBehavior(X).
Note that the above formula fails to encode half of the CPT entries, where okJ = 0.In order to complete the encoding of the CPT of node J, we introduce an extra Boolean variable θ J , and write ¬okJ → (J ↔ θ J ).Finally, the health variables (okA, okJ) are associated with the probabilities of the respective gates being healthy (0.9 in our experiments), and each θ-variable (θ J ) is associated with the probability of the corresponding gate giving an output of 1 when broken (0.5 in our experiments; thus assuming that the output of a faulty gate is probabilistically independent of its inputs).
The above encoding of the circuit is similar to the encoding of Bayesian networks described by Darwiche (2003) in the following way: According to the encoding by Darwiche, for every node in a Bayesian network and for every value of it there is an indicator variable.Similarly for every conditional probability there is a network parameter variable.In our encoding, the variables for the wires are analogous to the network indicators, where the encoding is optimized such that there is a single indicator for both values of the wire.Also, our encoding exploits the logical constraints and does not generate network parameters for zeros and ones in the CPT.Finally, the encoding for a node that represents a health variable has been optimized such that we only need a single ok-variable which serves both as an indicator and as a network parameter.
Once all components are encoded as described above, the union (conjunction) of the formulas is compiled into d-DNNF.The required probabilities can be exactly computed by evaluating and differentiating the d-DNNF in time linear in its size (Darwiche, 2003).Details of the compilation process are discussed by Darwiche (2004), and the computation of probabilities is described in Appendix A.
We now present our hierarchical diagnosis approach and propose a new measurement selection heuristic.

Hierarchical Sequential Diagnosis
An optimal solution to sequential diagnosis would be a policy, that is, a plan of measurements conditioned on previous measurement outcomes, where each path in the plan leads to a diagnosis of the system (Heckerman et al., 1995).As computing optimal policies is intractable in general, we follow the approach of heuristic measurement point selection as in previous work.
We start with a definition of Shannon's entropy ξ, which is defined with respect to a probability distribution of a discrete random variable X ranging over values x 1 , x 2 , . . ., x k .Formally: (3) Entropy measures the amount of uncertainty over the value of the random variable.It is maximal when all probabilities P r(X = x i ) are equal, and minimal when one of the probabilities is 1, corresponding nicely to our intuitive notion of the degree of uncertainty.In gde the entropy is computed for the probability distribution over the set of computed diagnoses (i.e., the value of the random variable X here ranges over the set of diagnoses).As mentioned earlier, this entropy can be difficult to compute when the number of diagnoses is large (de Kleer & Williams, 1987;de Kleer, 2006).

Baseline Approach
Able to compute probabilities efficiently and exactly following successful d-DNNF compilation, we now propose a new two-part heuristic that circumvents this limitation in scalability.First, we consider the entropy of a candidate variable to be measured.

Heuristic Based on Entropy of Variable
Since a wire X only has two values, its entropy can be written as: where p x = P r(X = 1 | Y = y) and p x = P r(X = 0 | Y = y) are the posterior probabilities of X having values 1 and 0, respectively, given the values y of wires Y whose values are known.
While ξ(X) captures the uncertainty over the value of the variable, we can also interpret it as the expected amount of information gain provided by measuring the variable.Hence as a first idea we consider selecting a variable with maximal entropy for measurement at each step.

Improving Heuristic Accuracy
This idea alone, however, did not work very well in our initial experiments.As would be confirmed by subsequent experiments, this is largely due to the fact that the (implicit) space of all diagnoses is generally very large and can include a large number of unlikely diagnoses, which tends to compromise the accuracy of the information gain provided by the entropy.The experiments to confirm this explanation are as follows.
When the d-DNNF compilation is produced, and before it is used to compute probabilities, we prune the d-DNNF graph so that models (satisfying variable assignments) corresponding to diagnoses with more than k broken components are removed. 1We set the initial k to the number of actual faults in the experiments, and observed that a significant reduction of diagnostic cost resulted in almost all cases.This improved performance is apparently due to the fact that the pruning updates the posterior probabilities of all variables, making them more accurate since many unlikely diagnoses have been eliminated.
In practice, however, the number of faults is not known beforehand and choosing an appropriate k for the pruning can be nontrivial (note that k need not be exactly the same as the number of actual faults for the pruning to help).Interestingly, the following heuristic, which is the one we will actually use, appears to achieve a similar performance gain in an automatic way: We select a component that has the highest posterior probability of failure (an idea from Heckerman et al., 1995; see Section 8), and then from the variables of that component, measure the one that has the highest entropy.This heuristic does not require the above pruning of the d-DNNF, and appears to improve the diagnostic cost to a similar extent by focusing the measurement selection on the component most likely to be broken (empirical results to this effect are given and discussed in Section 7.1).

The Algorithm
We start by encoding the system as a logical formula as discussed in Section 3, where a subset of the variables are associated with numbers representing the prior fault probabilities and probabilities involved in the fault models of the components, which is then compiled into d-DNNF ∆.
The overall sequential diagnosis process we propose is summarized in Algorithm 1.The inputs are a system C, its d-DNNF compilation ∆, the set of faults D (which is empty but will be used in the hierarchical approach), a set of known values y of variables, and an integer k specifying the fault cardinality bound (this is for running the model pruning experiments described in Section 4.1.2,and is not required for diagnosis using our final return < D , y > 7: Measure variable X which is the best under a given heuristic 8: Add the measured value x of X to y, and go back to line 1 heuristic).We reduce ∆ by pruning some models (line 1) when the fault cardinality bound k is given, using the function reduce(∆, D, k − |D|).reduce accepts as arguments the current DNNF ∆, the set of known faults D, and the upper bound given by k − D on the cardinality of remaining faults, whereas it returns the pruned DNNF.Reduce excludes the known faults in D when computing the minimum cardinality of ∆, and then uses k − |D| as the bound on the remaining faults (explained further in Appendix B). ∆ is reduced first time when psd is called and later each time D is changed (i.e., when a component is found faulty).We then evaluate (line 2) and differentiate (line 3) ∆ (see Appendix A), select a measurement point and take the measurement (line 7), and repeat the process (line 8) until the stopping criteria are met (line 5).
The stopping criteria on line 5 are given earlier in Section 2 as the goal condition, i.e., we stop when the abnormal observation is explained by all the faulty components D already identified assuming that other components are healthy.A faulty component X is identified when P r(okX = 1, y) = 0 where y are the values of variables that are already known, and as mentioned earlier these probabilities are obtained for all variables simultaneously in the d-DNNF differentiation process.Finally, the condition that the current set of faulty components, with health modes H f , explains the observation is satisfied when P r(H f = 0, H\H f = 1, y) > 0, which is checked by a single evaluation of the original d-DNNF.The algorithm returns the actual faults together with the new set of known values of variables (line 6).

Hierarchical Approach
We now scale our approach to handle larger systems using the idea of abstraction-based hierarchical diagnosis (Siddiqi & Huang, 2007).The basic idea is that the compilation of the system model into d-DNNF will be more efficient and scalable when the number of system components is reduced.This can be achieved by abstraction, where subsystems, known as cones, are treated as single components.An example of a cone is depicted in Figure 1.The objective here is to use a single health variable and failure probability for the entire cone, hence significantly reducing the size of the encoding and the difficulty of compilation.Once a cone is identified as faulty in the top-level diagnosis, it can then be compiled and diagnosed, in a recursive fashion.
We now give formal definition of abstraction from our previous work:

Abstraction of System
Abstraction is based upon the structural dominators (Kirkland & Mercer, 1987) of a system.A component X dominates a component Y , or X is called a dominator of Y , if any path from Y to any output of the system contains X.A cone corresponds precisely to the set of components dominated by a component.A cone may contain further cones leading to a hierarchy of cones.
A system can be abstracted by treating all maximal cones in it as black boxes (a maximal cone is one that is either contained in no other cone or contained in exactly one other cone which is the whole system).In our example, cone A can be treated as a virtual gate with two inputs {P, D} and the output A. The abstraction of a system can be formally defined as: Definition 1 (Abstraction of System).Given a system C, let C = C if C has a single output; otherwise let C be C augmented with a dummy component collecting all outputs of C. Let O be the only output of C .The abstraction A C of system C is then the set of components X ∈ C such that X is not dominated in C by any component other than X and O.
For example, A C = {A, B, D, K, V }.J ∈ A C as J cannot reach any output without passing through A, which is a dominator of J.
In our previous work (Siddiqi & Huang, 2007), we only dealt with the task of computing minimum-cardinality diagnoses, which does not involve probabilities or measurement selection.In the context of sequential diagnosis, several additional techniques have been introduced, particularly in the computation of prior failure probabilities for the cones and the way measurement points are selected, outlined below.

Propositional Encoding
We start with a discussion of the hierarchical encoding for probabilistic reasoning, which is similar to the hierarchical encoding presented in our previous work (Siddiqi & Huang, 2007).Specifically, for the diagnosis of the abstraction A C of the given system C, health variables are only associated with the components A C \I C , which are the gates {A, B, D, K, V } in our example (I C stands for the set of inputs of the system C).Thus the gate J in Figure 1 will not be associated with a health variable, as J is a wire internal to the cone rooted at A. Consequently, only the nodes representing the components A C \I C will have health nodes associated with them in the corresponding Bayesian network.Hence the node okJ is removed from the Bayesian network in Figure 2.
In addition, we define the failure of a cone to be when it outputs the wrong value, and introduce extra clauses to model the abnormal behavior of the cone.For example, the encoding given in Section 3.2 for cone A in Figure 1 (in the dotted box) is as follows: The first part of the formula encodes the normal behavior of gate J (without a health variable); the next encodes the normal behavior of the cone; the last encodes that the cone outputs a wrong value when it fails.Other gates (that are not roots of cones) in the abstraction A C are encoded normally as described in Section 3.2.
Note that the formulas for all the components in a cone together encode a single CPT for the whole cone, which provides the conditional probability of the cone's output given the health and inputs of the cone, instead of the health and inputs of the component at the root of the cone.For example, the above encoding is meant to provide the conditional probability of A given P , D, and okA (instead of J, D, and okA), where okA represents the health mode of the whole cone and is associated with its prior failure probability, which is initially unknown to us and has to be computed for all cones (explained below).Such an encoding of the whole system provides a joint probability distribution over the variables

Prior Failure Probabilities for Cones
When a cone is treated as a single component, its prior probability of failure as a whole can be computed given the prior probabilities of components and cones inside it.We do this by creating two copies ∆ h and ∆ f of the cone, where ∆ h models only the healthy behavior of the cone (without health variables), and ∆ f includes the faulty behavior as well (i.e., the full encoding described in Section 3.2).The outputs of both ∆ h and ∆ f are collected into an XOR-gate X(when the output of XOR-gate X equals 1, both of its inputs are forced to be different in value).We then compute the probability P r(X = 1) giving the probability of the outputs of ∆ h and ∆ f being different.The probability is computed by compiling this encoding into d-DNNF and evaluating it under X = 1.
Note that this procedure itself is also abstraction-based and hierarchical, performed bottom-up with the probabilities for the inner cones computed before those for the outer ones.Also note that it is performed only once per system as a pre-processing step.

Measurement Point Selection and Stopping Criteria
In principle, the heuristic to select variables for measurement and the stopping criteria are the same as in the baseline approach; however, a couple of details are worth mentioning.
First, when diagnosing the abstraction of a given system (or cone) C, the measurement candidates are restricted to variables A C ∪I C , ignoring the internal variables of the maximal cones-those are only measured if a cone as a whole has been found faulty.
Second, it is generally important to have full knowledge of the values of cone's inputs before a final diagnosis of the cone is concluded.A diagnosis of a cone concluded with only partial knowledge of its inputs may not include some faults that are vital to the validity of global diagnosis.The reason is that the diagnosis of the cone assumes that the unknown inputs can take either value, while in reality their values may become fixed when variables in other parts of the system are measured, causing the diagnosis of certain cones to become invalid, and possibly requiring the affected cones to be diagnosed once again to meet the global stopping criteria (see line 17 in Algorithm 2).
To avoid this situation while retaining the effectiveness of the heuristic, we modify the measurement point selection as follows when diagnosing a cone.After selecting a component with the highest probability of failure, we consider the variables of that component plus the inputs of the cone, and measure the one with the highest entropy.We do not conclude a Evaluate (∆, y), Differentiate ( ∆ ) 13: else 14: goto line 3 diagnosis for the cone until values of all its inputs become known (through measurement or deduction), except when the health of all the components in the cone has been determined without knowing all the inputs to the cone (it is possible to identify a faulty component, and with strong fault models also a healthy component, without knowing all its inputs).Note that the restriction of having to measure all the inputs of a cone can lead to significant increase in the cost compared with the cost of baseline approach; especially when the number of inputs of a cone is large.This is discussed in detail in Section 6.

The Algorithm
Pseudocode for the hierarchical approach is given in Algorithm 2 as a recursive function.The inputs are a system C, a set of known values u C of variables at the inputs I C and outputs O C of the system, and again the optional integer k specifying the fault cardinality bound for the purpose of experimenting with the effect of model pruning.We start with the d-DNNF compilation of the abstraction of the given system (line 1) and then use the function psd from Algorithm 1 to get a diagnosis B of the abstraction (line 3), assuming that the measurement point selection and stopping criteria in Algorithm 1 have been modified according to what is described in Section 4.2.4.The abstract diagnosis B is then used to get a concrete diagnosis D in a loop (lines 4-14).Specifically, if a component G ∈ B is not the root of a cone, then it is added to D (line 14); otherwise cone G is recursively diagnosed (line 10) and the result of it added to D (line 11).When recursively diagnosing a cone G, the subsystem contained in G is represented by D G ∪ I G , where D G is the set of components dominated by G and I G is the set of inputs of cone G.
Before recursively diagnosing a cone G, we compute an abnormal observation u G at the inputs and the output (I G ∪{G}) of the cone G.The values of some of G's inputs and output will have been either measured or deduced from the current set of measurements.The value of a variable X is implied to be x under the measurements y if P r(X = ¬x, y) = 0, which is easy to check once ∆ has been differentiated under y.The function Implications(∆, y) (lines 7 and 15) implements this operation, which is used to compute the partial abnormal observation u G (line 8).A fault cardinality bound k for the cone G is then inferred (line 9), and the algorithm called recursively to diagnose G, given u G and k .
The recursive call returns the faults T inside the cone G together with the updated observation u G .The observation u G may contain some new measurement results regarding the variables I G ∪ {G}, which are added to the set of measurements y of the abstraction (line 11); other measurement results obtained inside the cone are ignored due to reasons explained in Section 4.2.4.The concrete diagnosis D is augmented with the faults T found inside the cone (line 11), and ∆ is again evaluated and differentiated in light of the new measurements (line 12).
After the loop ends, the variable u C is updated with the known values of the inputs I C and outputs O C of the system C (line 16).The stopping criteria are checked for the diagnosis D (line 17) and if met the function returns the pair < D, u C > (line 18); otherwise more measurements are taken until the stopping criteria (line 17) have been met.
Since D can contain faults from inside the cones, the compilation ∆ cannot be used to check the stopping criteria for D (note the change in the parameters to the function MeetsCriteria at line 17) as the probabilistic information regarding variables inside cones is not available in ∆.The criteria are checked as follows instead: We maintain the depth level of every component in the system.The outputs of the system are at depth level 1 and the rest of the components are assigned depth levels based upon the length of their shortest route to an output of the system.For example, in Figure 1 gates B and J are at depth level 3, while A is at depth level 2. Hence, B and J are deeper than A. We first propagate the values of inputs in the system, and then propagate the fault effects of components in D, one by one, by flipping their values to the abnormal ones and propagating them towards the system outputs in such a way that deeper faults are propagated first (Siddiqi & Huang, 2007), and then check the values of system outputs obtained for equality with those in the observation (y).

Example
Suppose that we diagnose the abstraction of the circuit in Figure 1, with the observation and take the sequence of measurements y = {D = 1, K = 1, A = 1}.It is concluded, from the abstract system model, that given the values of P and D, the value 1 at A is abnormal.So the algorithm concludes a fault at A. Note that Q = 1 and D = 1 suggests the presence of another fault besides A, triggering the measurement of gate B, which is also found faulty.The abstract diagnosis {A, B} meets the stopping criteria with respect to the abstract circuit.We then enter the diagnosis of cone A by a recursive call with observation u A = {P = 1, B = 1, A = 1}.The diagnosis of the cone A immediately reveals that the cone E is faulty.Hence we make a further recursive call in order to diagnose E with the observation u E = {P = 1, B = 1, E = 1}.The only unknown wire J is measured and the gate J is found faulty, which explains the observation at the outputs of the cones E as well as A, given the inputs P and B. The recursion terminates and the abstract diagnosis B = {A, B} generates the concrete diagnosis D = {J, B}, which meets the stopping criteria and the algorithm terminates.

Component Cloning
In the preceding section, we have proposed an abstraction-based approach to sequential diagnosis, which reduces the complexity of compilation and diagnosis by reducing the number of system components to be diagnosed.We now take one step further, aiming to handle systems that are so large that they remain intractable even after abstraction, as is the case for the largest circuits in the ISCAS-85 benchmark suite.
Our solution is a novel method that systematically modifies the structure of a system to reduce the size of its abstraction.Specifically, we select a component G with parents P (a component X is a parent of a component Y , and Y is a child of X, if the output of Y is an input of X) that is not part of a cone and hence cannot be abstracted away in hierarchical diagnosis, and create a clone G of it according to some of its parents P ⊂ P in the sense that G inherits all the children of G and feeds into P while G no longer feeds into P (see Figures 3 and 4 for an example).The idea is to create a sufficient number of clones of G so that G and its clones become part of some cones and hence can be abstracted away.Repeated applications of this operation can allow an otherwise unmanageable system to have a small enough abstraction for compilation and diagnosis to succeed.The hierarchical algorithm is then extended to diagnose the new system and the result mapped to the original system.We show that we can now solve almost all the benchmark circuits, using this approach.
Before we go into the details of the new method, we differentiate it from a technique known as node splitting (Choi, Chavira, & Darwiche, 2007), which is used to solve MPE queries on a Bayesian network.Node splitting breaks enough number of edges between nodes from the network such that the MPE query on the resulting network becomes easy to solve.A broken edge is replaced with a root variable with a uniform prior.The resulting network is a relaxation or approximation of the original in that its MPE solution, which may be computed from its compilation, gives an upper bound on the MPE solution of the original network.A depth-first branch and bound search algorithm then searches for an optimal solution using these bounds to prune its search space.A similar approach is also used to solve Weighted Max-SAT problems (Pipatsrisawat & Darwiche, 2007).
This version of node splitting is not directly applicable in the present setting for the following reasons.If edges in a system are broken and redirected into new root variables (primary inputs), the resulting system represents a different input-output function from that of the original system.The abnormal observation on the original system may hence become a normal one on the new system (if the edges through which the fault propagates are broken), eliminating the basis for diagnosis.Our technique of component cloning, which can also be viewed as a version of node splitting, introduces clones of a component instead of primary inputs and preserves the input-output function of the system.Also, the new system is a relaxation of the original in that its diagnoses are a superset of those of the original.
We now formally define component cloning: Definition 2 (Component Cloning).Let G be a component in a system C with parents P. We say that G is cloned according to parents P ⊂ P when system C results in a system C as follows: • The edges going from G to its parents P are removed.
• A new component G functionally equivalent to G is added to the system such that G shares the inputs of G and feeds into each of P .
Figures 3 and 4 show an example where creating a clone B of B according to {D} results in a new circuit whose abstraction contains only the gates {A, D, K, V }, whereas the abstraction of the original circuit contains also gate B.

Choices in Component Cloning
There are two choices to be made in component cloning: Which components do we clone, and for each of them how many clones do we create and how do they split the parents?Since the goal of cloning is to reduce the abstraction size, it is clear that we only wish to clone those components that lie in the abstraction (i.e., not within cones).Among these, cloning of the root of a cone cannot reduce the abstraction size as it will destroy the existing cone by reintroducing some of the components inside the cone into the abstraction.For example, cloning D according to K in Figure 4 will produce a circuit where D and its clone can be abstracted away but B is no longer dominated by D and hence is reintroduced into the abstraction.Therefore, the final candidates for cloning are precisely those components in the abstract system that are not roots of cones.Note that the order in which these candidates are processed is unimportant in that each when cloned will produce an equal reduction, namely a reduction of precisely 1 in the abstraction size, if any.
It then remains to determine for each candidate how many clones to create and how to connect them to the parents.To understand our final method, it helps to consider a naive method that simply creates |P| − 1 clones (where P is the set of parents) and has each clone, as well as the original, feed into exactly one parent.This way every parent of the component becomes the root of a cone and the component itself and all its clones are abstracted away.In Figure 3, for example, B has three parents {E, A, D}, and this naive method would create two clones of B for a total of three instances of the gate to split the three parents, which would result in the same abstraction as in Figure 4.
The trick now is that the number of clones can be reduced by knowing that some parents of the component may lie in the same cone and a single clone of the component according to those parents will be sufficient for that clone to be abstracted away.In the example of Figure 3, again, the parents E, A of B lie in the same cone A and it would suffice to create a single clone of B according to {E, A}, resulting in the same, more efficient cloning as in Figure 4.
More formally, we partition the parents of a component G into subsets P 1 , P 2 , . . ., P q such that those parents of G that lie in the same cone are placed in the same subset and the rest in separate ones.We then create q − 1 clones of G according to any q − 1 of these subsets, resulting in G and all its clones being abstracted away.This process is repeated for each candidate component until the abstraction size is small enough or no further reduction is possible.

Diagnosis with Component Cloning
The new system is functionally equivalent to the original and has a smaller abstraction, but is not equivalent to the original for diagnostic purposes.As the new model allows a component and its clones to fail independently of each other, it is a relaxation of the original model in that the diagnoses of the new system form a superset of those of the original.Specifically, each diagnosis of the new system that assigns the same health state to a component and its clones for all components corresponds to a diagnosis of the original system; other diagnoses are spurious and are to be ignored.
The core diagnosis process given in Algorithm 2 continues to be applicable on the new system, with only two minor modifications necessary.First, the spurious diagnoses are (implicitly) filtered out by assuming the same health state for all clones (including the original) of a component as soon as the health state of any one of them is known.Second, whenever measurement of a clone of a component is proposed, the actual measurement is  taken on the original component in the original system, for obvious reasons (in other words, the new system is used for reasoning and the original for measurements).
In principle, the presence of spurious diagnoses in the model can potentially skew the measurement point selection heuristic (at least in the early stages of diagnosis, before the spurious diagnoses are gradually filtered out).However, by using smaller benchmarks that could be diagnosed both with and without cloning, we conducted an empirical analysis which indicates, interestingly, that the overall diagnostic cost is only slightly affected.We discuss this in more detail in Section 7.3.

Diagnostic Cost Estimation
We now address an interesting issue stemming from an observation we made conducting experiments (to be detailed in the next section): While system abstraction is always beneficial to compilation, the diagnostic cost does not always improve with the associated hierarchical diagnosis.On the one hand, the hierarchical diagnosis approach can help in cases which otherwise result in high costs using baseline approach by quickly finding faulty portions of the system, represented by a set of faulty cones, and then directing the sequential diagnosis to take measurements inside those cones, resulting in more useful measurements.On the other hand, it can introduce overhead for cases where it has to needlessly go through hier-archies to locate the actual faults, and measure inputs of cones involved, while the baseline version can find them more directly and efficiently.
The overhead of hierarchical approach can be quite high for faults that lie in cones with a large number of inputs.For example, the graphs in Figure 5 show the number of inputs, represented as dots, of various cones in ISCAS-85 circuits.Note that most of the cones have a small number of inputs; however, some cones can have more than 30 inputs, especially in c432 and the circuits beyond c1908, which contribute to increased diagnostic cost in several cases (such increase in the cost due to cones was also confirmed by a separate set of experiments using a large set of systematically generated combinational circuits, detailed in Appendix C).To avoid the potential high cost of diagnosis for faults that lie in a cone with a large number of inputs it is tempting to destroy that cone before compilation so that any fault in it can now be directly found.However, due to the associated increase in the abstraction size, destroying cones may cause increased costs for those cases that could previously be solved more efficiently, and thus may show a negative impact, overall.This calls for an automatic mechanism to predict the effect of destroying certain cones on the overall diagnostic cost, which is the subject of this section.
We propose a novel cost estimation function to predict the average diagnostic cost when a given abstraction of the system is considered for diagnosis, where different abstractions can be obtained by destroying different cones in the system.Since cones can be destroyed automatically, the function can be used to automatically propose an abstraction of the system, to be used for diagnosis, that is more likely to give optimal average cost.The function uses only the hierarchical structure of the given abstraction to predict its cost and does not take into account other parameters that may also contribute to the cost, such as the probabilities.In addition the function is limited to single fault cases only.Therefore, the expected cost computed by this function is only indicative and cannot be always correct.However, experiments show that the function is often quite useful in proposing an abstraction of the system that is more likely to give optimal cost (to be discussed in the next section).
To estimate the expected diagnostic cost we assume that it is composed of two quantities namely the isolation cost and the abstraction cost, which are inversely proportional to each other.The isolation cost captures how well the given system abstraction can isolate the faulty portions of the system.Therefore the isolation cost is minimum when a complete abstraction of the system is used (i.e., all cones are considered) and generally increases as cones are destroyed.The abstraction cost captures the overhead cost due to introduction of cones.Hence, the abstraction cost is minimum (zero) when no abstraction is considered and generally increases as cones are introduced.
We define the isolation cost of diagnosis considering an abstraction of the system to be the average cost required to isolate a single fault in the system using that abstraction.Similarly, we define the abstraction cost of diagnosis to be the average overhead cost required to diagnose a single fault in the system using that abstraction.Then the expected average cost of diagnosis when an abstraction of the system is considered for diagnosis is the sum of the isolation and the abstraction costs for that abstraction.As different cones are destroyed in a given abstraction of the system we expect changes in the values of the abstraction and isolation costs, which determine whether the overall cost can go up or down (if the changes are uneven) or stay constant (if the changes are even).The idea is to obtain an abstraction of the system to strike a balance between the two quantities to get an overall optimal cost.Below we discuss how the isolation and abstraction costs can be estimated.
We noted in our experiments when using the baseline approach that our heuristic can isolate a single fault in the system with a cost that is on average comparable to the log 2 of the number of measurement points in the system, which provided us with the basis for computing the isolation cost.In the hierarchical approach, when a fault lies inside a cone one can first estimate the isolation cost of diagnosing the cone, separately, and then add it to the isolation cost of diagnosing the abstract system to get the average isolation cost for all (single) faults that lie in that cone.For example, when no cones are considered the cost of isolating a fault in the circuit in Figure 3 is log 2 (6) = 2.58 (values of P , Q, R and V are already known).However, when cones are considered the cost of isolating a fault that lies inside the cone A is the sum of the isolation cost of the abstract circuit and the isolation cost of the subcircuit inside cone A, which is log 2 (4) + log 2 (1) = 2. Similarly, to get an average isolation cost for all single faults in the system, when using the hierarchical approach, one can add the isolation cost of diagnosing the abstract system and the average of the isolation costs of diagnosing all the abstract components (where the isolation cost for an abstract component which is not a cone is zero).Note that the isolation cost of diagnosing a cone can be computed by again taking the abstraction of the cone.
To estimate the abstraction cost of diagnosis under a given abstraction we first need to estimate the overhead cost involved for each individual component in the system under that abstraction.To estimate the overhead cost of a, possibly faulty, component one can take the union of all the inputs and outputs of cones in which that component lies, and the number of such measurement points (approximately) constitutes the required overhead cost for that component.If a component does not lie in any cone then the overhead cost for that component is zero.For example, when the circuit in Figure 3 is diagnosed using the hierarchical approach, to find the gate J as faulty one must first find the cone A to be faulty and then the cone E to be faulty and then the gate J to be faulty.So the overhead cost for the gate J in this case will be 1 + 2 + 1 = 4 (i.e., we have to measure wires A, B, E, J, assuming that Q is known).The abstraction cost of diagnosis under a given abstraction of the system is then the average of the overhead costs of all the system components under that abstraction.
We now give formal definitions related to the cost estimation function.Let M P u (C) be the set of those measurement points in the system C whose values are unknown, and M P u (G) the set of those inputs and output of an abstract or concrete component G whose values are unknown.Let p be the number of abstract components in an abstraction A C of system C. Let G i ∈ A C be an abstract component (either a concrete component or a cone in the abstraction; a concrete component in the abstraction can be regarded as a trivial cone containing only the component itself).Let D G i be the subsystem dominated by G i and A G i be the abstraction of the subsystem.
The isolation cost IC(C, A C ) when an abstraction A C of the system C is considered for diagnosis is the sum of log 2 (|M P u (A C )|) and the average of the isolation costs computed, in a similar manner, for the subsystems contained in the abstract components in A C : where IC(D G i , A G i ) recursively computes the isolation cost of the subsystem contained in the abstract component G i , using Equation 5, by taking its abstraction A G i .Note that when computing IC(D G i , A G i ) we assume that the inputs and output of G i have already been measured.Thus M P u (D G i ) excludes the inputs and output of cone To compute the abstraction cost of diagnosing the system under a given abstraction we first compute the overhead costs of diagnosing individual cones in the abstraction.Then we multiply the abstraction cost for a cone with the number of components contained in that cone to get the total overhead cost for all the components in that cone.Adding up the overhead costs computed this way from all the cones in the abstraction and dividing this number by the total number of concrete components in the whole system gives us the average overhead cost per component, which we call the abstraction cost.Formally: Let there be q cones in A C .Then the abstraction cost AC(C, A C ) when the abstraction A C of the system C is considered for diagnosis is given as: where |D G i | is the number of (concrete) components contained in the cone G i , and M P u (G i )+ AC(D G i , A G i ) recursively computes the abstraction cost of diagnosing the cone G i , using Equation 6, by taking its abstraction A G i .When the abstraction cost of G i is multiplied by |D G i | we effectively add the cost of measuring cone inputs and output in the overhead cost of every component inside the cone.Again note that when computing AC(D G i , A G i ) we assume that all the variables in M P u (G i ) have already been measured.Thus M P u (D G i ) excludes the inputs and output of cone G i .
Finally the total expected cost EDC(C, A C ) of diagnosing a system C when an abstraction A C of the system is considered for diagnosis is given as: (7)

Experimental Results
This section provides an empirical evaluation of our new diagnostic system, referred to as sda (sequential diagnosis by abstraction), that implements the baseline, hierarchical, and cloning-based approaches described in Sections 4 and 5, and the cost estimation function described in Section 6.All experiments were conducted on a cluster of 32 computers consisting of two types of (comparable) CPUs, Intel Core Duo 2.4 GHz and AMD Athlon 64 X2 Dual Core Processor 4600+, both with 4 GB of RAM running Linux.A time limit of 2 hours and a memory limit of 1.5 GB were imposed on each test case.The d-DNNF compilation was done using the publicly available d-DNNF compiler c2d (Darwiche, 2004(Darwiche, , 2005)).
The CNF was simplified before compilation using the given observation, which allowed us to compile more circuits, at the expense of requiring a fresh compilation per observation (see Algorithm 2, line 1).We generated single-and multiple-fault scenarios using ISCAS-85 benchmark circuits, where in each scenario a set of gates is assumed to be faulty.For single-fault cases of circuits up to c1355 we simulated the equal prior probability of faults by generating n fault scenarios for each circuit, where n equals the number of gates in the circuit: Each scenario contains a different faulty gate.We then randomly generated 5 test cases (abnormal observations) for each of these n scenarios.Doing the same for multiple-fault scenarios would not be practical due to the large number of combinations, so for each circuit up to c1355 (respectively, larger than c1355) we simply generated 500 (respectively, 100) random scenarios with the given fault cardinality and a random test case for each scenario.
Thus in each test case we have a faulty circuit where some gate or gates give incorrect outputs.The inputs and outputs of the circuit are observed.The values of internal wires are then computed by propagating the inputs in the normal circuit towards the outputs followed by propagating the outputs of the assumed faulty gates one by one such that deeper faults are propagated first.The obtained values of internal wires are then used to simulate the results of taking measurements.We use P r(okX = 1) = 0.9 for all gates X of the circuit.Note that such cases, where all gates fail with equal probability, are conceivably harder to solve as the diagnoses will tend to be less differentiable.Then, for each gate, the two output values are given equal probability when the gate is faulty.Again, this will tend to make the cases harder to solve due to the high degree of uncertainty.For each circuit and fault cardinality, we report the cost (number of measurements taken) and time (including the compilation time, in CPU seconds) to locate the faults, averaged over all test cases solved.
We present the experiments in four subsections demonstrating the effectiveness of the four techniques proposed in this paper, namely the new heuristic, hierarchical sequential diagnosis, component cloning, and the cost estimation function.

Effectiveness of Heuristic
We start with a comparison of the baseline algorithm of sda with gde and show that sda achieves similar diagnostic costs and scales to much larger circuits, hence illustrating the effectiveness of our new heuristic (along with the new way to compute probabilities).

Comparison with gde
We could obtain only the tutorial version of gde (Forbus & de Kleer, 1993) for the comparison, downloadable from http://www.qrg.northwestern.edu/BPS/readme.html.gde uses ATCON, a constraint language developed using the LISP programming language, to represent diagnostic problem cases.A detailed account of this language is given by Forbus and de Kleer (1993).Further, it employs an interactive user interface that proposes measurement points with their respective costs and lets the user enter outcomes of measurements.For the purpose of comparison we translated our problem descriptions to the language accepted by gde, and also modified gde to automatically read in the measurement outcomes from the input problem description.We also compiled the LISP code to machine dependent binary code using the native C compiler to improve run-time performance.This version of gde, developed for tutorial purposes, computes the set of minimal diagnoses instead of probable diagnoses.This makes our comparison less informative.Nevertheless, we are able to make a reasonable comparison in terms of diagnostic cost as the set of minimal diagnoses can also serve as a large set of probable diagnoses when components have equal prior probabilities.According to de Kleer (1992) availability of more diagnoses aids in heuristic accuracy, whereas focusing on a smaller set of probable diagnoses can be computationally more efficient but increase the average diagnostic cost.
This version of gde was in fact unable to solve any circuit in ISCAS-85.To enable a useful comparison, we extracted a set of small subcircuits from the ISCAS-85 circuits: 50 circuits of size 13, 14, 15 and 16, and 10 circuits of size 17.For each circuit we randomly generated 5 single-fault, 5 double-fault, and 5 triple-fault scenarios, and one test case (input/output vector) for each fault scenario.The comparison between gde and sda (baseline) on these benchmarks given in Table 1 shows that sda performs as well as gde in terms of diagnostic cost.

Larger Benchmarks
To evaluate the performance of sda on the larger ISCAS-85 circuits, we have again conducted three sets of experiments, this time involving single, double, and five faults, respectively.As the version of gde available to us is unable to handle these circuits, in order to provide a systematic reference point for comparison we have implemented a random strategy where a random order of measurement points is generated for each circuit and used for all the test cases.This strategy also uses the d-DNNF to check whether the stopping criteria have been met.
Table 2 shows the comparison between the random strategy and sda using the baseline approach with two different heuristics, one based on entropies of wires alone (ew) and the other based also on failure probabilities (fp).For each of the three systems we ran the same set of experiments with and without pruning the d-DNNF (using the known fault cardinality as described in Section 4.1.2),indicated in the third column of the table.Only the test cases for the first four circuits could be solved.For other circuits the failure occurred during the compilation phase, and hence affected both the random strategy and sda.It is clear that the diagnostic cost is significantly lower with both heuristics of sda than with the random strategy whether or not pruning has been used.It is also interesting to note that pruning significantly reduces the diagnostic cost for the random and sda-ew strategies, but has much less effect on sda-fp except in a few cases (c1355 single-fault).Moreover, sda-fp generally dominates sda-ew, both with and without pruning.
We may also observe that (i) on the five-fault cases, sda-fp without pruning results in much lower diagnostic cost than sda-ew with pruning; (ii) on the double-fault cases, the two are largely comparable; and (iii) on the single-faults cases, the comparison is reversed.This indicates that as the fault cardinality rises, the combination of failure probabilities and wire entropies appears to achieve an effect similar to that of pruning.That sda-ew with pruning performs better than sda-fp without pruning on single-fault cases can be attributed to the fact that on these cases pruning is always exact and hence likely to result in maximum benefit.

Effectiveness of Abstraction
We now report, in Table 3, the results of repeating the same experiments with sda-fp using the hierarchical approach.
Most notably, the running time generally reduces for all cases and we are now able to handle two more circuits, namely c1908 and c2670, solving 139 of 300 cases for c1908 (25 of single-, 15 of double-, and 99 of five-fault cases) and 258 of 300 cases for c2670 (100 of  single-, 60 of double-, and 98 of five-fault cases).Again all failures occurred during the compilation phase.Note that some observations do not cause sufficient simplification of the theory for it to be successfully compiled even after abstraction.In terms of diagnostic cost, in most cases the hierarchical approach is comparable to the baseline approach.On c432, the baseline approach consistently performs better than the hierarchical in each fault cardinality, while the reverse is true on c1355.Note also that pruning helps further reduce the diagnostic cost to various degrees as with the baseline approach.
As discussed earlier, the results confirm that the main advantage of hierarchical approach is that larger circuits can be solved.For circuits that can also be solved by the baseline approach, hierarchical approach may help reduce the diagnostic cost by quickly finding faulty portions of the circuit, represented by a set of faulty cones, and then directing the measurements inside them, which can result in more useful measurements (e.g. in the case of c1355).On the other hand, it may suffer in cases where it has to needlessly go through hierarchies to locate the actual faults, while the baseline version can find them more directly and efficiently (e.g. in the case of c432).This is further discussed in Section 7.4.

Effectiveness of Component Cloning
In this subsection we discuss the experiments with component cloning.We show that cloning does not significantly affect diagnostic cost and allows us to solve much larger circuits, in particular, nearly all the circuits in the ISCAS-85 suite.
Table 4 shows the result of the pre-processing step of cloning on each circuit.The columns give the name of the circuit, the total number of gates in that circuit, the size of the abstraction of the circuit before cloning, the time spent on cloning, the total number of clones created in the circuit, and the abstraction size of the circuit obtained after cloning.On all circuits except c499, c1355, and c6288, a significant reduction in the abstraction size has been achieved.c6288 appears to be an extreme case with a very large abstraction that lacks hierarchy; while gates in the abstractions of c499 and c1355 are all roots of cones, affording no opportunities for further reduction (note that these two circuits are already very simple and easy to diagnose).
We start by investigating the effect of component cloning on diagnostic performance.To isolate the effect of component cloning we use the baseline version of sda (i.e., without abstraction), and without pruning.Table 5 summarizes the performance of baseline sda with cloning on the circuits c432 and c880.Comparing these results with the corresponding entries in Table 2 shows that the overall diagnostic cost is only slightly affected by cloning.We further observed that in a significant number of cases the proposed measurement sequence did not change after cloning, while in most of the other cases it changed only insubstantially.Moreover, in a number of cases, although a substantially different sequence of measurements was proposed, the actual diagnostic cost did not change much.Finally, note that the diagnosis time in the case of c432 has reduced after cloning, which can be ascribed to the general reduction in the complexity of compilation due to a smaller abstraction.Our final set of experimental results with ISCAS-85 circuits, summarized in Table 6, illustrates the performance of hierarchical sequential diagnosis with component cloningthe most scalable version of sda.All the test cases for circuits c1908 and 2670 were now solved, and the largest circuits in the benchmark suite could now be handled: All the cases for c5315, 164 of the 300 cases for c3540 (34 of single-, 65 of double-, and 65 of five-fault cases), and 157 of the 300 cases for c7552 (60 of single-, 26 of double-, and 71 of fivefault cases) were solved.In terms of diagnostic cost cloning generally resulted in a slight improvement.In terms of time the difference is insignificant for c432 and c880, and for the larger circuits (c1908 and c2670) diagnosis with cloning was clearly more than an order of magnitude faster.

Effectiveness of Diagnostic Cost Estimation
Finally, we demonstrate the effectiveness of our cost estimation function.We show that it is often possible to destroy different cones to obtain different abstractions of a system that can all be successfully compiled, and then, using the cost estimation function, select an abstraction to be used for diagnosis that is more likely to give optimal average cost.These results also help explain why in some cases the hierarchical approach causes diagnostic cost to increase compared with the baseline approach.
In these experiments, we use sda with cloning and include circuits up to c2670, considering only single-fault test cases.We did not include the largest circuits in our analysis as these circuits often could not be compiled after some cones in them were destroyed; therefore it was not possible to obtain an overall picture of the actual cost for these circuits.Test cases for circuits up to c1355 are the same as used before, whereas for circuits c1908 and c2670, this time, we use a more complete set of cases as done for smaller circuits.Specifically, we generate n fault scenarios for each circuit, where n equals the number of gates in the circuit: Each scenario contains a different faulty gate.We then randomly generate 1 test case for each of these n scenarios (in some cases, we could not obtain a test case in reasonable time and the corresponding scenarios were not used).
The results of experiments are summarized in Table 7.For each circuit the first row shows results when all cones have been considered and the subsequent rows show results when all cones having more than a specified number of inputs (in column 3) have been destroyed.When the value in column 3 is 0 we get the trivial abstraction, where all cones have been destroyed, which is equivalent to using the baseline approach.The last two columns show the (actual) average cost and time for diagnosing a circuit using the given abstraction.The columns labeled with AC, IC, and EDC show values obtained using the equations 6, 5, and 7, respectively, for a given abstraction.
The results show that we are often able to destroy several cones while still being able to compile the circuit successfully.However, quite naturally, the compilation time increases as more cones are destroyed such that at some point the circuits start to fail to compile, where we stop destroying cones.The actual diagnostic cost on different circuits show different trends each time some cones have been destroyed.For example, on c432 it shows significant improvement while the reverse is true for c1355.On remaining circuits the actual cost shows somewhat mixed trends; however, the relative increase or decrease in the costs is generally less significant.
Comparison of the isolation and abstraction costs (i.e., IC and AC, respectively) for various abstractions confirms that each time some cones are destroyed the isolation cost increases while the abstraction cost decreases.It is the potentially imbalanced change in the two costs that determines whether the cost might go up or down after the cones are destroyed.For example, in the case of c432 the abstraction cost drops more rapidly than the isolation cost increases when cones are destroyed, while in the case of c1355 the two costs change almost at the same pace.
Comparison of the predicted costs EDC with the actual costs shows that for c432, c499, c1908, and c2670 the predicted costs are often quite close to the actual costs, which demonstrates the relative accuracy of our approach.As a result, for these circuits the cost estimation function can accurately predict the abstraction that is more likely to give optimal cost.For example, it correctly suggests that one should use the baseline approach with c432.For the other two circuits, c880 and c1355, the predicted and actual costs are significantly different, and the cost estimation function fails to give good predictions.c1355 seems to be a special case in which the actual diagnostic cost increases quite rapidly as cones are destroyed, the reason for which will be an interesting topic for future work.Out et al. (1994) considered two kinds of hierarchical models and discussed automatic methods for constructing their abstractions.In the first kind, components of the given detailed model are aggregated into single components of the abstract model, such that every diagnosis of the detailed model, refined from a diagnosis of the abstract model, is guaranteed to be valid.Thus there is no need to check the validity of detailed diagnoses afterwards.In the second kind, the abstract model is constructed such that it is always possible to determine a unique diagnosis at every level of the hierarchy with a reasonable cost, where the measurements that are less costly to make appear in the most abstract model and the more costly measurements appear in the most detailed model.More techniques for automatic abstraction-based on system observability were discussed by Torta andTorasso (2003, 2008).These papers provide alternative techniques to automatic abstraction; however, they do not address sequential diagnosis.

Related Work
The idea of testing the most likely failing component comes from Heckerman et al. (1995), where the testing of a component was considered a unit operation and components were tested in decreasing order of their likelihood of failure, which was computed assuming a single fault (this assumption could compromise the quality of the measurement sequence in multiple-fault cases as the authors pointed out).In our case, by contrast, the testing of each variable of a component is a unit operation, calling for a more complex heuristic in order to minimize the number of tests; also, we do not need to assume a single fault.Our work also goes further in scalability using several structure-based techniques: compilation, abstraction, and component cloning.Chittaro & Ranon (2004) considered the computation of diagnoses using a hierarchical algorithm.Their method takes a hierarchical decomposition of the system as input, where sets of components are aggregated into units, and computes a set of diagnoses at the most abstract level, which are then refined hierarchically to the most detailed level.Feldman & van Gemund (2006) developed a hierarchical diagnosis algorithm and tested it on reverse engineered ISCAS-85 circuits (Hansen, Yalcin, & Hayes, 1999) that are available in highlevel form.The idea is to decompose the system into hierarchies in such a way as to minimize the sharing of variables between them.This can be done for well engineered problems and they have formed hierarchies by hand for ISCAS-85 circuits.The system is represented by a hierarchical logical formula where each hierarchy is represented by a traditional CNF formula.This representation can be translated to a fully hierarchical DNF, a fully flattened DNF, or a partially flattened DNF dictated by a depth parameter, after which a hierarchical search algorithm is employed to find the diagnoses.The hierarchical aspect of these two approaches is similar to that of ours; however, they require a hierarchical decomposition of the system to be either given as part of the input, or obtained by hand, while our approach searches for hierarchies automatically.Another major difference is that they consider only the computation of diagnoses and do not address the problem of sequential diagnosis.
Based on the gde framework, de Kleer ( 2006) studied the sensitivity of diagnostic cost to what is called the -policy, which is the policy that quantifies how the posterior probabilities of diagnoses are to be estimated when gde computes its heuristic.In our case, probabilities of diagnoses are not required at all, and the other probabilities that are required can all be computed exactly by evaluating and differentiating the d-DNNF.Nevertheless, our algorithm can be sensitive to the initial probabilistic model given and sensitivity analysis in this regard may lead to interesting findings.
Recently, Flesch, Lucas, & van der Weide (2007) proposed a new framework to integrate probabilistic reasoning into model-based diagnosis.The framework is based upon the notion of conflict measure, which originated as a tool for the detection of conflicts between an observation and a given Bayesian network (Jensen, 2001).When a system is modeled as a Bayesian network for diagnostic reasoning, it is possible to use this conflict measure to differentiate between diagnoses according to their degree of consistency with a given set of observations.This work, however, does not address the problem of sequential diagnosis, i.e., locating actual faults by taking measurements.
Most recently, Feldman, Provan, and van Gemund (2009) proposed a related method for reducing diagnostic uncertainty.While our work attempts to identify the actual faults with the fewest individual measurements, their heuristic was aimed at reducing the number of diagnoses with the fewest test vectors.

Conclusion
We have presented a new system for sequential diagnosis, called sda, that employs four new structure-based techniques to scale diagnosis to larger systems.Specifically, it uses a heuristic for measurement selection that can be computed efficiently from the d-DNNF compilation of the system.To diagnose larger systems, it automatically computes a structural abstraction of the system and performs diagnosis in a hierarchical fashion.It then employs a structure-based technique for further reducing the abstraction size of the system, which scales the diagnosis to the largest benchmark systems.Finally, it can automatically select an abstraction of the system that is more likely to give optimal average cost.(determinism).For example, Figure 6 shows a d-DNNF compilation of the subcircuit in the dotted box of Figure 1 under the observation A ∧ P ∧ D. Given a d-DNNF compilation, the probability P r(E = e) for an instantiation e of any set of variables E can be obtained by the following linear-time procedure: (i) Set all variables E to Boolean constants according to the instantiation e, (ii) set all other literals (not in E) to true except those that have numbers associated with them (negative literals are associated with 1 minus the corresponding numbers for the positive literals), and (iii) evaluate the d-DNNF bottom-up by treating true as 1, false as 0, the remaining leaves as their associated numbers, or-nodes as additions, and and-nodes as multiplications.The number at the root will be P r(E = e).For example, Figure 6 shows the computation of the probability of J = 1 given the observation A ∧ P ∧ D. Thus e = {A = 1, P = 1, D = 1, J = 1}.In the d-DNNF, we set A = 1, P = 1, D = 1, J = 1, ¬J = 0.The rest of the literals are given values that are associated with them (discussed in Section 3.2).
Furthermore, a second traversal of the d-DNNF, from the top down, can effectively differentiate the d-DNNF so that updated probabilities are computed at once for every possible change in the value of a variable (e.g., from unknown to known) (Darwiche, 2003).This is useful for our measurement point selection where we need to update the entropies for all candidate measurement points.by Darwiche (2001).If, however, k is greater than the minimum-cardinality of the d-DNNF then we need a similar but modified minimization algorithm to make sure we do not remove diagnoses of cardinality ≤ k.
While a complete pruning is difficult to achieve in general, an approximation is possible.In a naive approach, one may remove every child l of every or-node n for which minimumcardinality (mc) of l is greater than k, which will be sound in that it will never remove diagnoses of cardinality ≤ k but may result in too little pruning in many cases.We can increase the amount of pruning performed by computing local value k(n) for every node n given the global k for the whole d-DNNF using a top-down traversal through the d-DNNF: Every node n suggests a value k(l) for its child l and the largest of these values is accepted to be the final value of k(l) (this is essential to avoid possibly removing diagnoses of cardinality ≤ k).More pruning can occur in this way because k(n) can often be less than the global k.Once k(n) has been computed for every node, every child l of every or-node n for which mc(l) > k(l) can then be pruned.
We now give the pruning algorithm which performs a two pass traversal through the d-DNNF.The mc(n) is updated during upward traversal and represents the minimumcardinality of diagnoses under a node n, whereas the k(n) is updated during downward traversal and represents the upper bound on the fault-cardinality for a node which is used to prune branches emanating from the node whose mc(n) exceeds the k(n).
The two passes of the procedure are as follows: Initialize mc(n) to 0 and k(n) to -∞ (least possible value) for all n.Traverse the d-DNNF so that children are visited before parents and for every leaf node, set mc(n) to 1 if n is a negated health variable and 0 otherwise; for every or-node, set mc(n) to the minimum of the values of mc of its children; for every and-node set mc(n) to the sum of the values of mc of its children.Now traverse the d-DNNF so that parents are visited before children and set k(n) for the root node to the value k; for every or-node, remove every child p of n for which mc(p) > k(n) and for every remaining child v set k(v) to k(n) if k(n) > k(v); for every child p of every and-node, let t p be the sum of the values of mc of all the other children and set k(p) to the value t p if t p > k(p).
In the above procedure the conditions k(n) > k(v) and t p > k(p) while updating k for a node ensure that only a safe value for k is set.An example is shown in Figure 7.The mc (left) and k (right) values are shown for each node.The branches labeled α, β, γ, and that when circuit size is held constant, diagnostic cost generally increases with the number of cones in the circuit, which helped us devise the notion of abstraction cost.
The circuits are generated by composing a set of pre-formed building blocks.The latter consist of both gates and cones.The gates are taken from a pool of six gates of types OR, NOR, AND, NAND, NOT, and BUFFER, and the cones from a pool of eight cones, each of which has 10 gates and is extracted from ISCAS-85 benchmark circuits.
Our composition method is inspired from the method of generating random Bayesian networks described by Marinescu, Kask, and Dechter (2003).The circuits are generated according to a formula (N, P, I), where N is the number of components (building blocks) to use, P the percentage of cones in the components, and I the maximum number of inputs a gate can have.To generate the N components we randomly pick (P/100) * N cones (with repetition) from the pool of cones and N − (P/100) * N gates (with repetition) from the pool of gates and place them in a random order.The number of inputs of each gate is set randomly between 2 and I, except for a NOT or BUFFER gate which can have only one input.
We then process each component as follows: Suppose that the components are placed in the order C 1 , C 2 , . . ., C N .Let P i be the set of components that precede C i in the order.When we process a component C i we connect every input of C i to the output of a randomly chosen component from P i such that no two inputs of C i are connected to the same component.If an input of C i cannot be connected (either because P i is empty or all the components in P i have been used) then it is treated as a primary input of the circuit.For example, the circuit in Figure 8 has been randomly generated according to the formula (8, 25, 2), where the components shown in the boxes represent cones.
By varying the parameters (N, P, I) we can obtain circuits of varying size and structure.First we fix P = 25, I = 5 and vary N to generate a range of circuits of increasing size.For each N we generate 10 circuits.These circuits are summarized in Table 8.The numbers in the columns are averaged over all circuits of a given size, and rounded off.Generally, when N is increased we see an increase in the abstraction size as well as the estimated treewidth, corresponding to an increase in the perceived difficulty of the circuit (e.g., note that the largest circuit in this set is smaller than c1355, but the estimated treewidth of c1355 is much lower, at 25; the actual compilation was indeed harder for the former circuit).For each circuit we randomly generate 10 single-fault, 10 double-fault, and 10 five-fault scenarios and a single test case for each scenario.
The results of experiments with these circuits are given in Tables 9, 10, and 11, using the baseline, hierarchical, and cloning techniques, respectively.These results are generally consistent with those obtained using the ISCAS-85 circuits.The baseline sda could not solve any circuit beyond (72,25,5).The hierarchical sda solved more circuits but could not solve any circuit beyond (80,25,5).The most scalable version of sda, with component cloning, solved much larger circuits, up to (168,25,5).
Note that there is a general trend of increase in diagnostic cost with increase in N .This is consistent with one's intuitive expectation that diagnostic uncertainty would increase with system size.Also note that diagnostic cost is often significantly higher for the hierarchical approach than the baseline approach.As discussed earlier, this can be attributed to the fact that the hierarchical approach often has to go through hierarchies of cones to reach a faulty gate, which the baseline approach may be able to reach more directly.Table 10: Hierarchical heuristic on randomly generated circuits (N, 25, 5).
We also observe that, again, pruning leads to a general improvement in diagnostic cost.The improvement is more significant for the hierarchical approach, which can be explained by the fact that the effect of pruning is much greater on the abstract model, as each branch pruned can correspond to a large part of the original system.
We now perform another set of experiments to study the impact of hierarchy in a controlled manner.This time we hold the size of the circuits more or less constant and vary the percentage of cones in them.Specifically, we generate a large number of random circuits with P ranging from 0 to 50, such that for each value of P the generated circuits contain 120 gates on average.
The experiments on these circuits are summarized in Table 12.Note that as P increases the estimated treewidth of the circuits decreases, as would be expected, and the actual compilation time indeed also decreases.The diagnostic cost, on the other hand, increases steadily up to P = 25 and remains more or less flat afterwards.This confirms the potential Table 12: Component cloning on randomly generated circuits (N ,P ,5).
negative impact of hierarchy on the diagnostic cost we hypothesized: As P increases the likelihood of a fault occurring inside a cone also increases and thus on average one has to take more measurements, many on inputs to cones, to locate a fault.That diagnostic cost does not further increase after P = 25 is consistent with the observation that since the circuit size is fixed at roughly 120 and each cone contributes 10 gates to the circuit, when P increases to some point, there will be very few gates lying outside cones and hence the likelihood of a fault occurring in a cone will have more or less plateaued.

Figure 3 :
Figure 3: A faulty circuit with faults at B and J.

Figure 4 :
Figure 4: Creating a clone B of B according to D.

Figure 6
Figure 6: d-DNNF compilation of subcircuit (dotted) in Figure 1 given the observation A ∧ P ∧ D and computation of the posterior probability of J = 1.

Table 4 :
Results of preprocessing step of cloning.

Table 5 :
Effect of component cloning on diagnostic performance.

Table 6 :
Hierarchical sequential diagnosis with component cloning (c499 and c1355 omitted as they are already easy to diagnose and cloning does not lead to reduced abstraction).

Table 7 :
Effectiveness of diagnostic cost estimation.