Sum-of-Products with Default Values: Algorithms and Complexity Results

Weighted Counting for Constraint Satisfaction with Default Values (#CSPD) is a powerful special case of the sum-of-products problem that admits succinct encodings of #CSP, #SAT, and inference in probabilistic graphical models. We investigate #CSPD under the fundamental parameter of incidence treewidth (i.e., the treewidth of the incidence graph of the constraint hypergraph). We show that if the incidence treewidth is bounded, then #CSPD can be solved in polynomial time. More specifically, we show that the problem is fixed-parameter tractable for the combined parameter incidence treewidth, domain size, and support size (the maximum number of non-default tuples in a constraint), generalizing a known result on the fixed-parameter tractability of #CSPD under the combined parameter primal treewidth and domain size. We further prove that the problem is not fixed-parameter tractable if any of the three components is dropped from the parameterization.


Introduction
Sum-of-products is a well-studied framework that captures many important tasks (Dechter, 1999;Bacchus, Dalmao, & Pitassi, 2009). Among others, it captures problems such as the counting constraint satisfaction problem (#CSP), the propositional model counting problem (#SAT), and inference problem in probabilistic graphical models (PGMs). Here, we consider a natural formalization of sum-of-products in the terminology of Constraints Satisfaction Problem (CSP): Weighted Counting for Constraint Satisfaction with Default Values (#CSPD). #CSPD extends the standard CSP formalism by adding (i) a rational weight to each tuple in a constraint relation, as well as (ii) a default weight for each constraint, indicating the weight of assignments not represented by a tuple in the relation.
The weight of an assignment is the product over the weights of all constraints under that assignment, and the value of a #CSPD instance is the sum of these weights taken over all total assignments. For example, an instance of #SAT can be represented by introducing, for each clause, a constraint with default weight 1 containing a single tuple with weight 0. Conditional probability tables of a Bayesian Network (Pearl, 1988) can be directly encoded as constraints with tuple weights corresponding to conditional probabilities. Additionally, default values can be used to represent uniform probability distributions succinctly. Canonical algorithms for the sum-of-products problem run in polynomial time for instances of bounded primal treewidth, which is the treewidth of the graph whose vertices are variables, and where two variables are adjacent if and only if they appear together in the scope of a constraint (Dechter, 1999;Bacchus et al., 2009;Kask, Dechter, Larrosa, & Dechter, 2005). A runtime bound of this kind also holds for a variable elimination procedure tailored to #CSPD (Capelli, 2016). However, an instance of primal treewidth k may only contain relations of arity up to k + 1, so one can afford to expand any succinctly represented relation to a table of size n O(k) . We, therefore, need a more fine-grained measure than primal treewidth to capture advantages afforded by the use of default values.
Our main contribution is an algorithm, laid out in detail in Section 3, that solves #CSPD in polynomial time for instances of bounded incidence treewidth (the treewidth of the bipartite graph on variables and constraints where a variable and a constraint are adjacent if and only if the variable appears in the scope of the constraint). 1 This result is significant since the incidence treewidth is more general than primal treewidth: an instance of primal treewidth k has incidence treewidth at most k + 1, but there are instances of bounded incidence treewidth but arbitrarily large primal treewidth (see, e.g., Samer & Szeider, 2010b).
In the context of CSP and inference in PGMs, efforts toward obtaining even finergrained measures have led to the development of generalized hypertree decompositions (GHDs) (Gottlob et al., 2005) and GHD-based inference algorithms (Kask et al., 2005). Recently, it was shown that the sum-of-products problem can be solved in polynomial time if a measure of GHDs known as the fractional hypertree width is bounded (Khamis, Ngo, & Rudra, 2016). This bound requires that factors/constraints are given in a format where each nonzero tuple is represented explicitly. It is unlikely that a similar bound can be obtained for #CSPD because #SAT (and thus #CSPD) is #P-hard already for instances with acyclic constraint hypergraphs (Samer & Szeider, 2010a).
Our algorithm is elementary and combinatorial. It is based on dynamic programming along a tree decomposition, with the key ingredient being a notion of projection, which allows us to store the effect of partial assignments locally in dynamic programming tables (Paulusma, Slivovsky, & Szeider, 2016;Slivovsky & Szeider, 2013;Saether, Telle, & Vatshelle, 2015). The running time of our algorithm for #CSPD is polynomial, where the order of the polynomial depends on the incidence treewidth. In Section 4, we identify additional restrictions under which the algorithm runs in uniform polynomial time, i.e., where the degree of the polynomial does not depend on the incidence treewidth. Problems that such an algorithm can solve are called fixed-parameter tractable (Carbonnel & Cooper, 2016;Cygan, Fomin, Kowalik, Lokshtanov, Marx, Pilipczuk, Pilipczuk, & Saurabh, 2015;Downey & Fellows, 1999;Gottlob & Szeider, 2008). More specifically, we show that #CSPD is fixed-parameter tractable for the combined parameter consisting of the incidence treewidth, the domain size, and the 1. Inference in PGMs is known to be tractable for instances whose incidence graph is a tree (Barber, 2012, Ch.5). CSP without counting or weights, where constraints can either be represented by allowed or forbidden tuples, has also be addressed by Cohen, Green, and Houghton (2009) and by Chen and Grohe (2010); the latter work also obtains tractability results for such variants of CSP when the incidence treewidth is bounded.
maximum number of tuples present in a constraint. We also show that none of these three components of the parameter can be dropped without losing fixed-parameter tractability.

Preliminaries
In this section, we formalize the sum-of-products problem in terms of a weighted constraint satisfaction problem. Constraints are specified by weighted tuples; a default weight is provided for missing tuples. It is crucial to represent constraints of large arity succinctly. In such cases, we can utilize default values for a succinct representation. We also define the graph invariant treewidth and apply it to weighted constraint satisfaction instances via primal and incidence graphs. Since the treewidth of incidence graphs does not bound the arity of constraints, a succinct representation of constraints is critical in that setting.

Weighted Constraint Satisfaction with Default Values
Let V be a set of variables and D a finite set of values (the domain). A weighted constraint C of arity ρ over D with default value η is a tuple C = (S, F, f, η) where F ⊆ D ρ is called the support and f : F → Q is a mapping that assigns rational weights to the support.
Here, Q denotes the set of rational numbers. 3 We define |C| = |S| + |F | + 1 and var(C) = S. Since all the weighted constraints we consider will have a default value, we will use weighted constraint for brevity instead of weighted constraint with default value. On the other hand, a constraint is defined analogously as a weighted constraint, but without the components f and η. An assignment α : X → D is a mapping defined on a set X ⊆ V of variables; if X = V then α is a total assignment. An assignment α then extends α if ∀x ∈ X : α(x) = α (x). A weighted constraint C = (S, F, f, η) naturally induces a total function on assignments of its scope S = (x 1 , . . . , x ρ ): for each assignment α : X → D where X ⊇ S, we define the value C(α) of C under α as C(α) = f (α(x 1 ), . . . , α(x ρ )) if (α(x 1 ), . . . , α(x ρ )) ∈ F and C(α) = η otherwise.
An instance I of #CSPD is a tuple (V, D, C) where V = var(I) is the set of variables of I, D is its domain, and C is a set of weighted constraints over D. We define |I| as the sum of |V|, |D|, and |C| for each C ∈ C. The task in #CSPD is to compute the total weight of all assignments of V, i.e., to compute the value We observe that every instance of the classical #CSP problem can be straightforwardly translated into an instance of #CSPD: for each constraint in the #CSP instance, we create a weighted constraint, add the tuples of the constraint into F , have f map these to the value 1, and set the default value to 0. Similarly, every instance of #SAT can also be represented as an instance of #CSPD: for each clause, we create a corresponding weighted constraint, set F to be the only tuple that does not satisfy that clause. Let f map this tuple to 0 and set η = 1. Naturally, #CSPD also generalizes the weighted counting variants for #CSP and #SAT, but is also significantly more powerful than each of these formalisms on their own; indeed, it, for instance, allows us to perform weighted counting for the Mixed CSP problem (Cohen et al., 2009), where each constraint can be represented explicitly or by a clause.
We use standard graph terminology, see Diestel's textbook (2012). The primal graph of a #CSPD instance I is the graph whose vertices correspond to the variables of I and where two variables a, b are adjacent if and only if there exists a weighted constraint in I whose scope contains both a and b. The incidence graph of I is the bipartite graph whose vertices correspond to the variables and weighted constraints of I, and where vertices corresponding to a variable x and a weighted constraint C are adjacent if and only if x ∈ var(C).

Treewidth
Let G be a graph. A tree decomposition of G is a pair (T, χ) where T is a tree and χ : T → 2 V (G) is a mapping from tree nodes to subsets of V (G) such that: We call the vertices of T nodes and the sets in χ(t) bags of the tree decomposition (T, χ). The width of (T, χ) is equal to max{|χ(t)| − 1 | t ∈ V (T )}. The treewidth of G, denoted tw(G), is the minimum width over all tree decompositions of G. A tree decomposition (T, χ) is called nice if T is rooted and the following conditions hold: Every node of the tree T has at most two children; if a node t has two children t 1 and t 2 , then t is called a join node and χ(t) = χ(t 1 ) = χ(t 2 ); if a node t has one child t 1 , then either |χ(t)| = |χ(t 1 )| + 1 and χ(t 1 ) ⊂ χ(t) (in this case we call t an introduce node) or |χ(t)| = |χ(t 1 )| − 1 and χ(t) ⊂ χ(t 1 ) (in this case we call t a forget node); the root r of T satisfies χ(r) = ∅.
It is possible to transform a tree decomposition (T, χ) into a nice tree decomposition (T , χ ) of the same width in time O(|V | + |E|) (Kloks, 1994). Furthermore, it is possible to construct near-optimal tree decompositions for graphs of low treewidth efficiently: Fact 1 (Bodlaender et al., 2016). There exists an algorithm which, given an n-vertex graph G and an integer k, in time 2 O(k) · n either outputs a tree-decomposition of G of width at most 5k + 4 and O(n) nodes, or determines that tw(G) > k.
The primal treewidth (tw) of a #CSPD instance I is the treewidth of its primal graph, and similarly, the incidence treewidth (tw * ) of I is the treewidth of its incidence graph.

Solving #CSPD Using Incidence Treewidth
Here we show that #CSPD can be solved in polynomial time when restricted to instances of bounded incidence treewidth. We remark that, in parameterized complexity terminology, the algorithm is an XP algorithm. However, before we proceed to the algorithm itself, we will need to introduce projections, which are instrumental in defining the records used by our dynamic programming algorithm.

Projections
Let C = (S, F ) be an unweighted constraint where S = (x 1 , . . . , x l ) and let τ : X → D be an assignment. The projection of C with respect to assignment τ is the constraint C| τ = (S, F ), where F is the set of tuples of F compatible with τ , formally The algorithm presented in Section 3.2 groups assignments based on their projections. The key insight is that two assignments τ and σ are indistinguishable for a constraint C if the projections C| τ and C| σ are identical. The projection C| τ of a weighted constraint C = (S, F, η, f ) with respect to an assignment τ is simply the projection of its associated unweighted constraint (S, F ) with respect to τ .
We write C[X] to denote the set of projections of C with respect to assignments of X. The following observation notes that C[X] is not too large; this contrasts with the fact that the number of assignments of X may be exponential in the size of X.
Observation 1. Let C = (S, F ) be a constraint and let X be a set of variables. The following (in)equalities hold: Moreover, the union in the second bound is disjoint.
The projection of a constraint with respect to the union of two assignments can be computed from the projections of this constraint with respect to the individual assignments. We define the intersection of two unweighted constraints C 1 = (S, F 1 ) and C 2 = (S, F 2 ) with the same scope (which in the following will be projections of the same constraint) as C 1 ∩ C 2 = (S, F 1 ∩ F 2 ).
The value C(τ ) of a constraint under a complete assignment τ can be obtained from the projection C| τ in the following way. Let C = (S, F, η, f ) be a weighted constraint and B = (S, F ) a projection of C under an assignment of X ⊇ S; note that F is either empty or contains a single tuple s. We define val (C, B) as val (C, B) = η in the former case and val (C, B) = f (s) in the latter case.
Observation 3. For every assignment τ : X → D and constraint C with scope S ⊆ X we have val (C, C| τ ) = C(τ ).

The Algorithm
For this section, let I = (V, D, C) be an arbitrary but fixed instance of #CSPD, and let (T, χ) be a nice tree decomposition of its incidence graph. Let t ∈ V (T ) be a node of this tree decomposition. We refer to a vertex v (variable or constraint) as forgotten below t if there is a forget node t in the subtree rooted at t such that χ(t ) = χ(t ) \ {v}, where t is the child node of t . We write X t = χ(t) ∩ V for the set of variables in the bag of t, Y t for the set of variables forgotten below t, and Z t = X t ∪ Y t for their union. Furthermore, we write C t = χ(t) ∩ C for the set of constraints in the bag of t and F t for the set of constraints forgotten below t. Our goal is to compute the weight of assignments τ : Z t → D restricted to F t , that is, we want to compute the value of the following expression: (1) Since every variable and constraint is eventually forgotten, expression (1) computes sol(I) at the root of T . To perform dynamic programming, we will split the set D Zt into equivalence classes that keep track of the influence of assignments on constraints in C \F t (i.e., constraints that have not yet been forgotten). Let τ : Z t → D be an assignment and let C ∈ C \ F t . How can τ affect the constraint C? If C / ∈ C t then var(C) cannot contain variables forgotten below t since (T, X ) is a tree decomposition, so the effect of τ on C can be determined purely in terms of the restricted assignment τ | Xt . On the other hand, if C ∈ C t then the effect of τ on C can be characterized by a projection of C with respect to Z t (recall that the projection of a weighted constraint is simply the projection of the underlying unweighted constraint). To simplify the presentation of the following arguments, we will assume an ordering on the set of constraints in each bag. Let C t = (C 1 , . . . , C p ) be the constraints associated with node t. Let σ ∈ D Xt be an assignment and let B = (B 1 , . . . , B p ) be a vector where each We define A t (σ, B) as the set of assignments of Z t compatible with the assignment σ and projections B i , that is The sets A t (σ, B) yield a partition of the assignments in D Zt , since σ varies over all assignments to X t , the B i vary over all projections of a constraint under an assignment of Z t , and the projection of any constraint with respect to an assignment is unique. One can think of the pair (σ, B) as the "state" of the bag X t ∪ C t induced by an assignment of Z t and of A t (σ, B) as the set of all assignments of Z t which result in a particular state (σ, B) at node t. For each node t ∈ T and each pair (σ, B), we will compute and store values Q t (σ, B) where In the following, we will argue that the values Q t (σ, B) can be computed from values Q t (σ , B ) associated with child nodes t of t. To simplify notation, we may omit the names of nodes in the subscripts for nodes t with a single child node t . For instance, we will write X instead of X t , A instead of A t , and so forth. Moreover, we will use primes when referring to objects associated with t and write X instead of X t , A instead of A t , and so on. Further, for a variable x and an assignment σ whose domain includes x, we let σ x denote the restriction of σ to x. For each domain value d ∈ D, we let σ d x : {x} → D denote the assignment such that σ d x (x) = d. For a vector B = (B 1 , . . . , B l ) of constraints and a single constraint B we will write ( B, B) = (B 1 , . . . , B l , B) for their concatenation.
We first consider variable introduce nodes, that is, nodes t with a unique child node t such that X = X ∪ {x} for some variable x. Variable x is also included in the set Z, and each assignment τ : Z → D is obtained from an assignment τ : Z → D by extending with the singleton assignment σ x : {x} → D, where σ x (x) = τ (x). If τ ∈ A (σ , B ), we can simply extend by σ d x and take the projection of each B i ∈ B with respect to σ x to obtain the assignment σ and tuple B such that τ ∈ A(σ, B). Since the new variable x cannot occur in forgotten constraints, the values Q(σ, B) and Q (σ , B ) coincide.
Lemma 1. Let t be a variable introduce node with child t , and let x be the variable introduced by t. Further, let C = (C 1 , . . . , C p ), let σ : X → D be an assignment, and let B = (B 1 , . . . , B p ) be a vector such that such that the mapping f : τ → τ | Z is a bijection between A(σ, B) and A (σ , B ), where σ = σ| X . Moreover, Q(σ, B) = Q (σ , B ) in this case.
Proof. Suppose A(σ, B) is nonempty and let ξ ∈ A(σ, B). We let ξ = ξ| Z and define B i = C i | ξ for each i. Since C i | ξ = B i for each i this definition clearly satisfies B i = B i | σx . Let τ ∈ A(σ, B) be an assignment and let τ = τ | Z denote its image under f . It is trivially the case that τ ∈ A (σ , C), where C = (C 1 | τ , . . . , C p | τ ). We argue that for τ, ξ ∈ A(σ, B). If the projections B i = C i | ξ and C i | τ are distinct then by Observation 1 they must be disjoint. But since ( Observation 2, in that case the projections C i | τ and C i | ξ would have to be disjoint as well, a contradiction. We conclude that C i | τ = B i and thus τ ∈ A (σ , B i ). This proves that f is into. Since f is clearly injective, it remains to show that the mapping is surjective as well. Let τ ∈ A (σ , B ) and let τ = τ ∪ σ x so that f (τ ) = τ . Then τ | X = σ and (σ, B). We conclude that f is a bijection as claimed. Since (T, X ) is a tree decomposition, the newly introduced variable x does not occur in any constraint forgotten below t, so the assignments τ and f (τ ) always have the same weight. It follows that Q(σ, B) = Q (σ , B ).
Next, we consider constraint introduce nodes t such that C = C ∪ {C} for a constraint C. A newly introduced constraint C cannot contain forgotten variables, so its projection with respect to an assignment τ of Z is just the projection with respect to the restriction of τ to X. Lemma 2. Let t be an introduce node with child node t such that C = (C 1 , . . . , C p−1 ) and C = (C 1 , . . . , C p−1 , C). Further, let σ : X t → D be an assignment, let B = (B 1 , . . . , B p ) be a vector of constraints, and let B = (B 1 , . . . , B p−1 ) be the vector consisting of its first p − 1 components. The following statements hold: Proof. We must have B p = C| σ in order for Q(σ, B) to be nonzero since the newly introduced constraint C cannot contain variables forgotten below t. If B p = C| σ then it is readily verified that A(σ, B) = A (σ, B ). Since F = F the lemma follows.
A variable forget node t satisfies X = X \ {x} for some variable x. Upon forgetting a variable x, we sum up the values Q (σ ∪ σ d x , B) for all possible assignments σ d x . Lemma 3. Let t be a variable forget node with child t , and let x be the variable forgotten by t. Let σ : X → D be an assignment and let B = (B 1 , . . . , B p ) be a vector of constraints. Then The lemma now follows since F = F and the union is disjoint.
For a constraint forget node t we have C = C \ {C} for some constraint C. As C is added to the set of forgotten constraints, we have to include it in our weight calculations for Q(σ, B).   , ( B, B)).
Let B ∈ C[Z] be a projection and let τ ∈ A (σ, ( B, B)). Since C is forgotten at node t we have Moreover, var(C) ⊆ Z since we are forgetting C, so val (C, B) is defined and val (C, B) = C(τ ) by Observation 3. Putting everything together, we get by ( val (C, B) Q (σ, ( B, B)).
Finally, we deal with join nodes t that have child nodes t 1 , t 2 such that χ(t) = χ(t 1 ) = χ(t 2 ). In keeping with the presentation of the previous lemmas, we will simplify subscripts by writing, for instance, Z i instead of Z t i , and A i instead of A t i , for i ∈ {1, 2}. Further, we use the following notation: given two vectors B 1 = (B 1 , . . . , B p ) and B 2 = (B 1 , . . . , B p ) of constraints such that B i and B i have the same scope for each i ∈ [p], we write B 1 ∩ B 2 = (B 1 ∩ B 1 , . . . , B p ∩ B p ) for the vector obtained by taking the componentwise intersections.
Lemma 5. Let t be a join node with children t 1 and t 2 , let σ : X → D be an assignment, and let B = (B 1 , . . . , B p ) be a vector of constraints. We have
2. Do the following until the root r ∈ T is marked done. If t ∈ T is an unmarked node all of whose children t are marked done, compute the records R t based on the node type of t: (a) If t introduces a variable x, go through all nonzero records R t (σ , B ). For each assignment σ d x = {x → d}, compute the assignment σ = σ ∪ σ d x , as well as the vector If t introduces a constraint C such that C t = (C 1 , . . . , C p ) and C t = (C 1 , . . . , C p , C), enumerate the nonzero records R t (σ , B ) and then set R t (σ , B) = R t (σ , B ), where B = (B 1 , . . . , B p , C| σ ). Mark t done.
(c) If t is a variable forget node, go through all nonzero records R t (σ , B ) and add R t (σ , B ) to the entry R t (σ | Xt , B ). If the entry does not exist, create it and initialize with 0. Mark t done.
(e) For a join node t, go through all pairs of nonzero records R t 1 (σ, B 1 ) and R t 2 (σ, B 2 ) of its children t 1 and t 2 , and add the product R t 1 (σ, B 1 )R t 2 (σ, B 2 ) to the record R t (σ, B 1 ∩ B 2 ). Create and initialize records with 0 if necessary. Mark t done.
3. Once the root is marked done, there are two possibilities. If the record R r (ε, ()) exists, output its value; otherwise, output 0. Here, ε : ∅ → D denotes the empty assignment and () the empty tuple; Lemma 7. The above algorithm outputs sol(I).
Proof. We prove that R t (σ, B) = Q t (σ, B) whenever the entry R t (σ, B) exists, and Q t (σ, B) = 0 otherwise. For leaf nodes t this is immediate from Lemma 6. Inductively assume the statement holds for the children of a node t.
(a) Let t be a node that introduces variable x. The entry R t (σ, B) exists if, and only if, there is a record R t (σ , B ) with σ = σ ∪ σ d x and B i = B i | σ d x for each i. If the entry R t (σ, B) exists then R t (σ, B) = R t (σ , B ) and by assumption, R t (σ , B ) = Q t (σ , B ), so we have R t (σ, B) = Q t (σ, B) by Lemma 1. If the entry does not exist then there is no entry R t (σ , B ), so Q t (σ, B ) = 0 by assumption and Q t (σ, B) = 0 by Lemma 1.
An entry R t (σ, B) exists if, and only if, there is a record R t (σ, B ) and B p = C| σ . If the entry exists then R t (σ, B) = R t (σ, B ). By assumption, R t (σ, B ) = Q t (σ, B ) and by Lemma 2 Q t (σ, B) = Q t (σ, B ), so R t (σ, B) = Q t (σ, B) as required. If the record does not exist then there is no record R t (σ, B ) or B p = C| σ . In the former case Q t (σ, B ) = 0 by assumption and thus Q t (σ, B) = 0 by Lemma 2. In the latter case Q t (σ, B) = 0 by Lemma 2.
(c) Let t be a variable forget node and let x be the variable that is forgotten. A record R t (σ, B) exists if, and only if, there is a nonzero record R t (σ ∪ σ d x , B) for some d ∈ D, and R t (σ, B) corresponds to their sum in this case. By assumption, ) and thus Q t (σ ∪ σ d x , B) = 0 for each d ∈ D by assumption. Thus again Q t (σ, B) = 0 by Lemma 3.
There is a record R t (σ, B) if, and only if, there is a nonzero record R t (σ, ( B, B)), and in that case B, B)).
(e) Let t be a join node with children t 1 and t 2 . The entry R t (σ, B) exists if, and only if, there is a pair of nonzero records R t 1 (σ, B 1 ) and R t 2 (σ, B 2 ) such that B 1 ∩ B 2 = B. If such a pair exists we have By assumption, each term satisfies does not appear as a term in the above sum. The equivalence R t (σ, B) = Q t (σ, B) is immediate from Lemma 5. If there is no record R t (σ, B) there is no pair of nonzero records R t 1 (σ, B 1 ), R t 2 (σ, B 2 ) with B 1 ∩ B 2 = B. Thus, by assumption, Q t 1 (σ, B 1 ) = 0 or Q t 2 (σ, B 2 ) = 0 for each pair B 1 , B 2 such that B 1 ∩ B 2 = B. It again follows from Lemma 5 that Q t (σ, B) = 0.
Let sup be the largest size of a support over all constraints in C, let dom denote |D|, and let k be the width of the tree decomposition (T, χ).
Proof. For each node t we may have entries R t (σ, B) indexed by pairs (σ, B), where σ ∈ D |Xt| and B ∈ C 1 [Z t ] × · · · × C p [Z t ]. By Observation 1, |C[Z t ]| ≤ sup + 1 for any constraint C ∈ C t and thus the number of entries at node t is bounded by dom |Xt| · (sup + 1) |Ct| . It is not difficult to see that computing the entries for join nodes t is the computationally most demanding step. For each fixed assignment σ of X t we compute the product of Q t 1 (σ, B 1 ) and Q t 2 (σ, B 2 ) and add it to Q t (σ, B 1 ∩ B 2 ). Therefore, the update at t takes O * (dom |Xt| · (sup + 1) 2|Ct| ), where O * () suppresses polynomial factors. As the number of tree nodes is O(|I|) by Fact 1, the overall running time of the dynamic programming algorithm is O * (dom k · (sup + 1) 2k )|I| and thus in (dom + sup + 1) ck |I| for large enough c.
One can compute a nice tree-decomposition of the incidence graph of width at most 5tw * + 4 in time O(tw * · c tw * |I|) by running the algorithm of Fact 1 tw * times. In combination with the preceding lemmas, this proves the main result of this section.
Theorem 1. #CSPD can be solved in time

Fixed-Parameter Tractability of #CSPD
We use the framework of Parameterized Complexity (Cygan et al., 2015;Downey & Fellows, 1999Flum & Grohe, 2006;Gottlob & Szeider, 2008;Niedermeier, 2006) to provide a fine-grained complexity analysis of the algorithm presented in Subsection 3.2. A parameterized problem P takes a tuple (I, k) as an input instance, where k ∈ N is called the parameter. A parameterized problem is fixed-parameter tractable (FPT in short), parameterized by k, if it can be solved by an algorithm which runs in time f (k) · |I| O(1) for some computable function f . Algorithms with a running time of this form are called fixed-parameter algorithms. On the other hand, an algorithm which solves P in time |I| f (k) for some computable function f is called an XP algorithm, and parameterized problems which admit such an algorithm are said to belong to the class XP. The complexity class XP properly contains the class FPT. A parameterized problem belongs to the class para-NP if it admits a non-deterministic fixed-parameter algorithm.
In the parameterized complexity perspective, the algorithm of Subsection 3.2 is an XP algorithm for #CSPD parameterized by incidence treewidth. For a tuple σ of parameters, let us denote by #CSPD(σ) the problem #CSPD parameterized by the combined parameter σ. The following is immediate from Theorem 1, which states that #CSPD(tw * ) can be solved in time |I| O(tw * ) .
Consider the combined parameter (tw * , dom, sup), or simply take the sum of the three as the parameter. It is easy to see that the same analysis of Theorem 1 establishes that with respect to this combined parameter, #CSPD is fixed-parameter tractable. Corollary 2. #CSPD(σ) is FPT for the combined parameter σ = (tw * , dom, sup).
Corollary 2 generalizes a result to the effect that #CSPD(tw, dom) is FPT (Capelli, 2016). Before proceeding, we introduce the notion of parameter domination (Samer & Szeider, 2010b). Let σ = (p 1 , . . . , p r ) and σ = (p 1 , . . . , p s ) be two combined parameters. We say that σ dominates σ , and write as σ σ , if for each 1 ≤ i ≤ r there exists a computable function f that is monotonically increasing in each argument such that for each instance I we have p i (I) ≤ f (p 1 (I), . . . , p s (I)). It is not difficult to see that the parameter domination propagates fixed-parameter tractability: Lemma 9 (Samer & Szeider, 2010b). Let σ and σ are two combined parameters such that σ σ . If #CSPD(σ) is fixed-parameter tractable, then so is #CSPD(σ ).
Hence, to see that Corollary 2 implies fixed-parameter tractability of #CSPD(tw, dom), we only need to settle the parameter dominance (tw * , dom, sup) (tw, dom). First, it is known that tw * ≤ tw + 1 (Kolaitis & Vardi, 2000). Second, the maximum arity d of a #CSPD instance provides a lower bound on the primal treewidth tw since any constraint of arity d yields a clique of size d in the primal graph. Therefore we have d ≤ tw + 1. Now, we have sup ≤ dom d ≤ dom tw+1 . Therefore, the parameter domination holds as claimed.
A natural follow-up question to Corollaries 1 and 2 is whether #CSPD is fixed-parameter tractable when we drop some component(s) out of (tw * , dom, sup). To answer this question, we introduce some terminology of parameterized complexity.
An fpt-reduction from a parameterized problem P to a parameterized problem Q is a fixed-parameter algorithm that maps an instance (I, k) of P to an equivalent instance (I , k ) of Q such that k ≤ g(k) for some computable function g. The notion of fpt-reduction in parameterized complexity plays an analogous role of polynomial-time many-one reduction in classic complexity theory. Under fpt-reduction, a canonical hierarchy of complexity classes is well defined, which is called W-hierarchy. Namely, we have FPT ⊆ W[1] ⊆ W[2] ⊆ · · · ⊆ W[P] ⊆ XP.
The standard assumption is FPT = W[1] and it is known that FPT = W[1] implies the failure of Exponential Time Hypothesis (Chen, Huang, Kanj, & Xia, 2006). Therefore, if a parameterized problem is W[i]-hard (under an fpt-reduction), it is unlikely that the said problem admits a fixed-parameter algorithm.
On the other hand, W[P] ⊆ para-NP holds as well. A classic example of para-NPcomplete problem is q-Coloring parameterized by q. One can verify whether a given q-coloring of a graph is proper in (uniform) polynomial time, and thus the problem is in para-NP. It is known that NP-completeness of q-Coloring implies para-NP-completeness. The class para-NP is not contained in XP unless P = NP. We refer the reader to other sources (Downey & Fellows, 1999Flum & Grohe, 2006;Cygan et al., 2015) for in-depth treatment of parameterized complexity. Now, we consider the problem CSPD, the decision version of #CSPD asking whether sol(I) > L where L is a part of the input. Clearly, #CSPD is at least as hard as CSPD. The problem CSPD is NP-hard even when (dom, sup) are bounded by a constant (i.e., the problem is para-NP-hard). We can observe this by encoding 3CNF Satisfiability as CSPD with dom = 2 and sup = 1; a given 3-CNF formula is satisfiable if and only if sol(I) > 0 for 5. Fixed-parameter tractability of CSP parameterized by domain size, support size, and incidence treewidth (Samer & Szeider, 2010a).
Moreover, our algorithm can be easily adapted to compute a maximum of sums (rather than a sum of products) and deal with valued constraint satisfaction problems (VCSP) (Cohen, Cooper, Jeavons, & Krokhin, 2006). Tractability of #CSPD for instances with β-acyclic constraint hypergraphs was shown through an intricate variable elimination algorithm (Brault-Baron et al., 2015). This procedure naturally gives rise to a width parameter called the cover-width (Capelli, 2016). There are currently no efficient algorithms for computing this parameter. Whether bounds on the incidence treewidth can be translated into bounds on the cover-width (thus relating our dynamic programming algorithm to variable elimination) is an intriguing open question.