Boolean Equi-propagation for Concise and Efficient SAT Encodings of Combinatorial Problems

We present an approach to propagation-based SAT encoding of combinatorial problems, Boolean equi-propagation, where constraints are modeled as Boolean functions which propagate information about equalities between Boolean literals. This information is then applied to simplify the CNF encoding of the constraints. A key factor is that considering only a small fragment of a constraint model at one time enables us to apply stronger, and even complete, reasoning to detect equivalent literals in that fragment. Once detected, equivalences apply to simplify the entire constraint model and facilitate further reasoning on other fragments. Equi-propagation in combination with partial evaluation and constraint simplification provide the foundation for a powerful approach to SAT-based finite domain constraint solving. We introduce a tool called BEE (Ben-Gurion Equi-propagation Encoder) based on these ideas and demonstrate for a variety of benchmarks that our approach leads to a considerable reduction in the size of CNF encodings and subsequent speed-ups in SAT solving times.


Introduction
In recent years, Boolean SAT solving techniques have improved dramatically.Today's SAT solvers are considerably faster and able to manage larger instances than yesterday's.Moreover, encoding and modeling techniques are better understood and increasingly innovative.SAT is currently applied to solve a wide variety of hard and practical combinatorial problems, often outperforming dedicated algorithms.The general idea is to encode a (typically, NP) hard problem instance, µ, to a Boolean formula, ϕ µ , such that the satisfying assignments of ϕ µ correspond to the solutions of µ.Given such an encoding, a SAT solver can be applied to solve µ.
Tailgating the success of SAT technology are a variety of tools which can be applied to specify and then compile problem instances to corresponding SAT instances.The general objective of such tools is to facilitate the process of providing high-level descriptions of how the (constraint) problem at hand is to be solved.Typically, a constraint-based modeling language is introduced and used to model instances.Drawing on the analogy to programming languages, given such a description, a compiler can then provide a low-level executable for the underlying machine.Namely, in our context, a formula for the underlying SAT or SMT solver.
For example, Cadoli and Schaerf (2005) introduce NP-SPEC, a logic-based specification language which allows specifying combinatorial problems in a declarative way.At the core of this system is a component which translates specifications to CNF formula.Similarly Sugar (Tamura, Taga, Kitagawa, & Banbara, 2009) is a SAT-based constraint solver.To solve a finite domain constraint satisfaction problem it is first modeled in a constraint language (also called Sugar) and then encoded to a CNF formula and solved using the MiniSAT solver (Eén & Sörensson, 2003).MiniZinc (Nethercote, Stuckey, Becket, Brand, Duck, & Tack, 2007) is a constraint modeling language that is compiled by a variety of solvers to the low-level target language FlatZinc for which there exist many solvers.In particular, FlatZinc instances are solved by fzntini (Huang, 2008) by encoding them to CNF and in fzn2smt by encoding to SMT-LIB (Barrett, Stump, & Tinelli, 2010).
Simplifying CNF formulae prior to the application of SAT solving is of the utmost importance and there are a wide range of techniques that can be applied to achieve this goal.See for example the work of Li (2003), Eén and Biere (2005), Heule, Järvisalo, and Biere (2011), and Manthey (2012), and the references therein their work.All of these techniques exhibit a clear trade-off between the amount of simplification obtained and the time it requires.Moreover, the stronger techniques become prohibitive when the SAT model involves hundreds of thousands of variables and millions of clauses.So in CNF simplification tools, time limits on simplification techniques are imposed and/or approximations are used.This paper takes a new approach to CNF simplification.Typically, a CNF is not a random collection of clauses, but rather has a structure derived from an application or specific problem domain.When SAT solving is applied to encode and solve finite domain constraint problems, the original constraint model is a manifest of this structure.Usually, the constraints are discarded once encoded to CNF.We advocate that maintaining the constraints provides important structural information that can be applied to drive the process of CNF simplification.To be specific, the constraints in a model induce a partitioning of their CNF encoding to a conjunction of sub-formulae which we call "portions".
The novelty in our approach to CNF simplification is that instead of considering the CNF as a whole, we assume that it is partitioned into a conjunction of smaller portions.Then simplification is repeatedly applied to individual portions.This facilitates a propagationbased process because the simplification of one portion propagates information to all of the portions and this information may trigger further simplification in other portions.
Because portions are typically much smaller than the entire CNF we can effectively apply stronger simplification algorithms.We introduce the notion of equi-propagation.Similar to how unit propagation is about inferring unit clauses which can then be applied to simplify CNF formulae, equi-propagation is about inferring equational consequences between literals (and Boolean constants).
There is a wide body of research on CNF simplification that can be applied to implement equi-propagation which is sometimes called equivalent literal substitution, for example by Gelder (2005).Techniques typically involve binary clause based simplifications using, among others, hyper binary resolution and binary implication graphs.See for example, the work of Heule et al. (2011) and the references therein.The guiding principle in all of these works is that techniques must be simple and efficient because of the prohibitive size of the CNF to which they must apply.
Our approach is different and we focus on far richer forms of inference not even related to the CNF structure of a formula.At one extreme we apply complete equi-propagation which detects all equivalences implied by a formula.Clearly complete equi-propagation is NP-hard.However, complete equi-propagators are feasible as we apply them only to small portions of the formula.When complete equi-propagation is too slow we consider ad-hoc techniques.All of these forms of equi-propagation have in common that they are not driven by the CNF structure (e.g.binary clauses) but rather by the underlying constraint structure from which a CNF was, or is being, generated.
The rest of this paper is structured as follows.Section 2 introduces a modeling language for finite domain constraints which consists of just 5 constraint constructs and is sufficient to illustrate the contribution of the paper.We argue that the constraints in a model induce a natural partition of their CNF encoding to smaller portions and that this partition can be used to drive the simplification of the CNF encoding.Section 3 presents equi-propagation which is the first ingredient for our contribution.Equi-propagation is about learning information that will apply to simplify CNF encodings.Section 4 describes a practical basis for implementing equi-propagation.Section 5 introduces the second ingredient: partial evaluation.Given the information derived using equi-propagation, partial evaluation applies to simplify the constraints and in particular to remove Boolean variables from their CNF encodings.Section 6 describes a tool, called BEE (Metodi & Codish, 2012) (Ben-Gurion Equi-propagation Encoder) that is based on equi-propagation and partial evaluation.We introduce here our full constraint language which is similar to Sugar and to the subset of FlatZinc relevant for finite domain constraint problems.We also spell out the special treatment of the all-different constraint in BEE.Section 7 demonstrates the application of BEE.Section 8 presents an experimental evaluation.and Finally Section 9 presents our conclusion.

Constraint Based Boolean Modeling
This section provides the basis for our contribution: a constraint-based modeling language, together with a Boolean interpretation for each constraint in the language.This enables us to view a constraint model as a conjunction of Boolean formulae and provides a structure which drives the subsequent encoding to CNF.
We first introduce a simple and small fragment of a typical finite domain constraint-based modeling language.This serves to illustrate our approach.Later, in Section 6, we show the full language.We then discuss several options for Boolean representation of integers.In this paper we adopt a particular unary representation, called the order encoding.Our contribution is independent of this choice, although equi-propagation works well with it.Finally we finish the section so that each of the constraints in the language fragment can be viewed as a Boolean formula, and a constraint model as their conjunction.

Constraint Language Fragment
We focus on a small fragment of a typical constraint modeling language detailed in Figure 1.This serves to present the main ideas of the paper.Constraint (1) is about declaring finite domain integer variables in the range [c 1 ...c 2 ].For simplicity in the presentation we will further assume that c 1 ≥ 0. Constraints (2-3) are about difference of integer variables, and constraints (4-5) are about sums of integer variables.As syntactic sugar we also allow writing integer constants in constraints.For example, int neq(I, 5) which is short for new int(I , 5, 5), int neq(I, I ).

Modeling Kakuro: an Example
A Kakuro puzzle is an n × m board of black and white cells.The black cells contain hints and the white cells are to be filled by numbers between 1 and 9 (the bound 9 is often generalized by a larger value r).The hints specify constraints on the sums of the values in blocks of white cells to the right and/or below the hint.The numbers assigned to the white cells in such a block are required to be "all different".Figure 2 illustrates a 4 × 4 Kakuro puzzle (left) and its solution (right).
To model a Kakuro puzzle we view it as a set of blocks (of white cells) where each block B is a set of integer variables and is associated with a corresponding integer value, hint(B).Each block B is associated with two constraints: the integers in B must sum to hint(B) and must be all-different.Figure 3 illustrates the constraints corresponding to the Kakuro instance in Figure 2.

Representing Integers
A fundamental design choice when encoding finite domain constraints concerns the representation of integer variables.Gavanelli (2007)  direct-, support-and log-encodings) and introduces the log-support encoding.Given a choice of representation constraints are bit-blasted and interpreted as Boolean formulae.We focus for now on the use a unary representation, the so-called, order-encoding (see, e.g.Crawford & Baker, 1994;Bailleux & Boufkhad, 2003) which has many nice properties when applied to small finite domains.
In the order-encoding, an integer variable X in the domain [0, . . ., n] is represented by a bit vector X = [x 1 , . . ., x n ].Each bit x i is interpreted as X ≥ i so in particular the bit sequence X constitutes a monotonic non-increasing Boolean sequence.For example, the value 3 in the interval [0, 5] is represented in 5 bits as [1, 1, 1, 0, 0].
An important property of a Boolean representation for finite domain integers is the ability to represent changes in the set of values a variable can take.It is well-known that the order-encoding facilitates the propagation of bounds.Consider an integer variable X = [x 1 , . . ., x n ] with values in the interval [0, n].To restrict X to take values in the range [a, b] (for 1 ≤ a ≤ b ≤ n), it is sufficient to assign x a = 1 and x b+1 = 0 (if b < n).The variables x a and x b for 0 ≥ a > a and b < b ≤ n are then determined true and false, respectively, by unit propagation.For example, given X = [x 1 , . . ., x 9 ], assigning x 3 = 1 and x 6 = 0 propagates to give X = [1, 1, 1, x 4 , x 5 , 0, 0, 0, 0], signifying that dom(X) = {3, 4, 5}.
We observe an additional property of the order-encoding for X = [x 1 , . . ., x n ]: its ability to specify that a variable cannot take a specific value 0 ≤ v ≤ n in its domain by equating two variables: x v = x v+1 .This indicates that the order-encoding is well-suited not only to propagate lower and upper bounds, but also to represent integer variables with an arbitrary, finite set, domain.For example, given X = [x 1 , . . ., x 9 ], equating x 2 = x 3 imposes that X = 2. Likewise x 5 = x 6 and x 7 = x 8 impose that X = 5 and X = 7. Applying these equalities to X gives, X = [x 1 , x 2 , x 2 , x 4 , x 5 , x 5 , x 7 , x 7 , x 9 ] (note the repeated literals), signifying that dom(X) = {0, 1, 3, 4, 6, 8, 9}.
The order-encoding has many additional nice features that can be exploited to simplify constraints and their encodings to CNF.To illustrate one, consider a constraint of the form A + B = 5 where A and B are integer values in the range between 0 and 5 represented in the order-encoding.At the bit level (in the order encoding) we have: A = [a 1 , . . ., a 5 ] and B = [b 1 , . . ., b 5 ].The constraint is satisfied precisely when B = [¬a 5 , . . ., ¬a 1 ].Instead of encoding the constraint to CNF, we substitute the bits b 1 , . . ., b 5 by the literals ¬a 5 , . . ., ¬a 1 , and remove the constraint.In section 3 we formalize this process of discovering equalities between literals implied by a constraint and using them to simplify CNF encodings.

Bit Blasting
Given a constraint model and the decision on how to represent finite domain integer variables at the bit level (we chose the order encoding), "bit-blasting" is the process of instantiating integer variables by corresponding bit vectors and interpreting constraints as Boolean formulae.
Each integer variable, I, declared by a constraint of the form new int(I, c 1 , c 2 ) where 0 ≤ c 1 ≤ c 2 is represented as a bit-vector I = [1, . . ., 1, X c1+1 , . . ., X c2 ].So, we may view a constraint model as consisting only of Boolean variables and each constraint c corresponds to a Boolean formula denoted as [[c]], the "bit-blasted" version of c.The specific definition of [[•]] is not important.Just for illustration, note that one could define where to simplify presentation we assume that I 1 = [x 1 , . . ., x n ] and I 2 = [y 1 , . . ., y n ] are represented in the same number of bits.The mapping [[•]] extends in the natural way to apply to conjunctions of constraints.So, given a constraint model such as the one in Figure 3, integer variables are instantiated to unary (order encoding) bit vectors and each constraint is viewed as a Boolean formula.The constraint model takes a Boolean representation as the conjunction of these formulae.

Boolean Equi-propagation
In this section we present an approach to propagation-based SAT encoding, Boolean equipropagation, which propagates information about equalities between Boolean literals (and constants).We prove that Boolean equi-propagation is stronger than unit propagation as it determines at least as many fixed literals as unit propagation.We demonstrate, with an example, the power of equi-propagation and show that it leads to a considerable reduction in the size of the CNF encoding.

Boolean Equi-propagation
Let B be a set of Boolean variables.A literal is a Boolean variable b ∈ B or its negation ¬b.The negation of a literal , denoted ¬ , is defined as ¬b if = b and as b if = ¬b.The Boolean constants 1 and 0 represent true and false, respectively.The set of literals is denoted L and L 0,1 = L ∪ {0, 1}.The set of (free) Boolean variables that appear in a Boolean formula ϕ is denoted vars(ϕ).We extend the vars function to sets of formulae in the natural way.
An assignment, A, is a partial mapping from Boolean variables to constants, often viewed as the following set of literals: b A(b) = 1 ∪ ¬b A(b) = 0 .For a formula ϕ and b ∈ B, we denote by ϕ[b] (likewise ϕ[¬b]) the formula obtained by substituting all occurrences of b ∈ B in ϕ by true (false).This notation extends in the natural way for sets of literals.We say that A satisfies ϕ if vars(ϕ) ⊆ vars(A) and ϕ[A] evaluates to true.A Boolean Satisfiability (SAT) problem consists of a Boolean formula ϕ and determines if there exists an assignment which satisfies ϕ.
A Boolean equality is a constraint = where , ∈ L 0,1 .An equi-formula E is a set of Boolean equalities understood as a conjunction.The set of Boolean equalities is denoted L eq 0,1 and the set of equi-formulae is denoted E.

Equi-propagation
is a process of inferring equational consequences from a Boolean formula and given equational information.An equi-propagator for a formula ϕ is an extensive function µ ϕ : E → E defined such that for all E ∈ E, That is, a conjunction of equalities, at least as strong as E, made true by ϕ ∧ E. We say that an equi-propagator µ ϕ is complete if µ ϕ (E) = e ∈ L eq 0,1 ϕ ∧ E |= e .We denote a complete equi-propagator for ϕ as μϕ .We assume that equi-propagators are monotonic: In particular, this follows, by definition, for complete equi-propagators.In Section 3.3 we discuss several methods to implement complete and incomplete equi-propagators.
Example 2. Consider the constraint C = new int(X, 0, 4) ∧ new int(Y, 0, 4) ∧ int neq(X, Y) and its corresponding Boolean representation ϕ = [[C]] on the bit representation where Assume the setting where The following theorem states that complete equi-propagation is at least as powerful as unit propagation.
Theorem 3. Let μϕ be a complete equi-propagator for a Boolean formula ϕ.Then, any literal that is made true by unit propagation for any clausal representation of ϕ using the equations in E is also determined true by μϕ (E).The following example illustrates that equi-propagation can be more powerful than unit propagation.

Boolean Unifiers
It is sometimes convenient to view an equi-formula E in a generic "solved-form" as a Boolean substitution, θ E , which is a (most general) unifier for the equations in E. Boolean substitutions generalize assignments in that variables can be bound also to literals.A Boolean substitution is an idempotent mapping θ : B → L 0,1 where dom Note also that θ is defined for all B and that its domain, dom(θ), includes those elements for which it is non-identity.A Boolean substitution, θ, is viewed as the set θ = b → θ(b) b ∈ dom(θ) .We can apply θ to another substitution θ , to obtain substitution (θ • θ ) = b → θ(θ (b)) b ∈ dom(θ) ∪ dom(θ ) .A unifier for equi-formula E is a substitution θ such that |= θ(e), for each e ∈ E. A most-general unifier for E is a substitution θ such that for any unifier θ of E, there exists substitution γ where θ = γ • θ.
Example 5. Consider the equi-formula We define a canonical most-general unifier unify E for any satisfiable equi-formula E where: That is, the substitution unify E maps each b to the smallest literal equivalent to b given E. We can compute unify E in almost linear (amortized) time using a variation of the union-find algorithm (Tarjan, 1975).
Example 6.For the equi-formula E and substitution θ from Example 5 we have that unify E = θ.
The following proposition provides the foundation for equi-propagation based Boolean simplification.It allows us to apply equational information to simplify a given formula.In particular, if E is an equi-formula about literals occurring in ϕ then unify E (ϕ) is smaller than ϕ in that it contains fewer variables.
Proposition 1.Let ϕ be a Boolean formula and E ∈ E be a satisfiable equi-formula.Then,

The Equi-propagation Process
The equi-propagation process presented now is a central theme in this paper: Let Φ = ϕ 1 ∧ • • • ∧ ϕ n be a partitioning of a Boolean formula to n portions, let µ ϕ 1 , . . ., µ ϕn be corresponding equi-propagators, and take initial E = ∅.Satisfiability of Φ can be determined as follows: 1.So long as possible, select ϕ i such that µ ϕ i (E) E and update E = µ ϕ i (E).
2. Finally, when the equi-propagators apply no more, check if unify E (Φ) is satisfiable.
3. If η is a satisfying assignment for unify E (Φ) then unify E •η is a satisfying assignment for Φ.
We typically apply this equi-propagation theme to the Boolean representation Φ = Here we require that each C i is a "small" conjunction of constraints.Typically, the integer variables referred to in each C i are also declared in C i (sometimes this requires duplicating the variable declarations).For an individual constraint c we denote by c + the conjunction of constraints including c and the declarations for integer variables it refers to.The specifics of these declarations will be clear from the context.
Example 7. Let C be the following constraint model: We have As a basis for equi-propagation we take and and Z = [1, z 2 , z 3 ] and applying corresponding complete equi-propagators and starting with E 0 = ∅ we have: At this point equi-propagation applies no more, and unify is a tautology (all of the Boolean variables are determent by equi-propagation).
The following theorem clarifies that the order in which equi-propagators are applied in the equi-propagation process does not influence the final result.
Theorem 8.The equi-propagation process is confluent.
The following proposition provides an alternative, more efficient to implement, definition for complete equi-propagation.
Proposition 2. Let ϕ be a Boolean formula and μϕ a complete equi-propagator for ϕ.
Computing μϕ is considerably more efficient than μϕ since we can simply examine the formula ϕ after the application of unify E to determine new Boolean equality consequences.
Finally we comment: Our intention is that the equi-propagation process be applied not only to make a SAT instance smaller but also to obtain an easier to solve representation.However, decreasing the size of the CNF is not the main objective.In fact, often we explicitly introduce redundancies to improve a SAT encoding.For example, consider an "if-thenelse" construct, x↔ITE(s,t,f), where propositional variable: s indicates the "selector", t indicates the "true branch", f indicates the "false branch", and x indicates the result.The corresponding CNF is {{¬s, ¬t, x}, {¬s, t, ¬x}, {s, ¬f, x}, {s, f, ¬x}}.Eén and Sörensson (2006) propose to add redundant clauses, {¬t, ¬f, x} and {t, f, ¬x}.They comment that this improves the encoding and they observe that redundant clauses are often introduced to achieve arc-consistency in the SAT encoding.We show that given a clausal encoding of some formula Φ, application of equi-propagation can only strengthen unit propagation.
Theorem 9. Let C be a set of clauses, and suppose C |= E where E is an equi-formula.Then unit propagation on unify E (C) is at least as strong as unit propagation on C.
Proof.Unit propagation on C starting from assignment A 0 repeatedly chooses a clause c ∪ {l} ∈ C where {¬l | l ∈ c} ⊆ A i and sets A i+1 := A i ∪ {l}.Unit propagation terminates with A k when no such clauses occur.Note that failure is detected when A k contains both a literal and its negation.
We show that using a order of unit propagation on unify E (C) determined by that which occurs on C starting from assignment B 0 = unify E (A 0 ) we always obtain an assignment B i where B i ⊇ unify E (A i ).The proof is by induction on the unit propagation steps in C. The base case holds by construction. Assume Hence by unit propagation on unify E (C) and B i we obtain B i+1 := B i ∪ {unify E (l)}.Hence the induction holds.
Given that unit propagation reaches a unique fixpoint then any unit propagation order on unify E (A 0 ) will end up with an assignment B where B ⊇ B k ⊇ unify(A k )

The Power of Equi-propagation
To illustrate the impact of equi-propagation we come back to the Kakuro example from Section 2.2 (recall Figure 2).In fact solving such puzzles via SAT encodings is quite easy, with and without equi-propagation.So the example should only be viewed as illustrating We consider, as a baseline for this discussion, the following Boolean representation derived from a constraint model where the declarations which are not specified explicitly are of the form new int(I, 1, h) where h is the smallest hint for a block that includes I or the number 9 if that is smaller.
Notice that there is one "int neq" conjunct for each pair of white cells in the same block, and one "int array sum" conjunct for each block.Applying the equi-propagation process to Φ 1 with complete equi-propagators determines six integer values as depicted in Figure 4(a).Figure 4(b) illustrates the impact of applying the equi-propagation process where the equi-propagators are for allDiff constraints instead of for the individual int neq constraints.This determines seven integer variables and is formalized taking the following Boolean representation of the constraint model (and introducing an equi-propagator for each conjunct).
Figure 4(c) illustrates the impact of applying the equi-propagation process where the equipropagators are for pairs, each consisting of an allDiff constraint together with its corresponding sum constraint.This form of equi-propagation is most powerful.It fixes integer values for all of the white cells (in this example).We stress that equi-propagation reasons only about equalities between Boolean literals and constants.Here we take the model as: To further demonstrate the impact of equi-propagation, Table 1 provides data for 15 additional instances,1 categorized as: "easy", "medium" and "hard".Variables", the first four specify the number of unassigned white cells in the initial stage and after each of the three complete equi-propagation processes described above.From the five columns headed "Boolean variables", the first four indicate the corresponding information regarding the number of Boolean variables in the bit representations of the integers.So, the smaller the number in the table, the more variables have been removed due to equipropagation.In particular, the Φ 3 model completely solves 9 of the 15 instances.The two columns titled BEE show the corresponding information obtained using a weaker form of equi-propagation that is described in Section 4 below.The last row of the table indicates the average time it takes to perform equi-propagation (in seconds) using each of the three schemes, Φ 1 , Φ 2 , Φ 3 , and the weaker scheme titled BEE.We will come back to discuss this later after detailing how equi-propagation is performed.The results in the table indicate the clear benefit in performing equi-propagation based on coarser portions of the model.

Implementing Equi-propagators
To implement complete equi-propagators we need to infer Boolean equalities implied by a given Boolean formula, ϕ, and equi-formula, E. Based on Proposition 2, it is sufficient to test for the condition We consider three techniques: using a SAT solver, using BDD's, and using ad-hoc rules applied to the Boolean representations of individual constraints.
It is straightforward to implement a complete equi-propagator using a SAT solver.To test Condition (1) we consider the formula ψ = ϕ ∧ ( 1 ↔ 2 ).If ψ is not satisfiable, then Condition (1) holds.In this way, Condition (1) can be checked for all relevant equations involving variables from unify E (ϕ) (and constants 0,1).A major obstacle with this SATbased approach is that testing for a single equivalence, 1 ↔ 2 , is at least as hard as testing for the satisfiability of ϕ.In fact testing for unsatisfiability is typically more expensive.Hence the importance of our assumption that ϕ is only a small fragment of the CNF of interest.In practice SAT-based equi-propagation is surprisingly fast.For illustration, in the last row of Table 1 the average times for SAT-based complete equi-propagation for the different models are indicated in the columns Φ 1 , Φ 2 , and Φ 3 .It is interesting to observe that the strongest technique, using Φ 3 , is the fastest.This is because there are fewer (but larger) conjuncts and hence fewer queries to the SAT solver.
We can implement a complete equi-propagator using binary decision diagrams (BDDs) as follows.We construct a BDD for formula ϕ at the beginning of equi-propagation.When new equational information E is added to E we "simplify" the BDD for ϕ by conjoining the BDD with a BDD for E and then projecting out the variables that no longer appear in unify E (ϕ).Note that this "simplification" can increase the size of the BDD.In practice, rather than these two steps, we can use the "Restrict" operation of Coudert and Madre (1990) ("bdd simplify" in Somenzi, 2009) to create the new BDD more efficiently.
Given the BDD for unify E (ϕ), we can explicitly test Condition (1) using a standard BDD containment test (e.g., "bddLeq" in Somenzi, 2009).Just as in the SAT-based approach, this test is performed for all relevant equations involving variables from unify E (ϕ) (and constants 0,1).Alternately we can use the method of Bagnara and Schachte (1998) (extended to extract literal equalities as opposed to just variable equalities) to extract all the fixed literals and equivalent literal consequences of the BDD.
Example 10.Consider the BDD shown in Figure 5(a) which represents the formula: ϕ ≡ new int(A, 0, 3) ∧ new int(B, 0, 3) ∧ int neq(A, B).A major obstacle with this BDD-based approach concerns the size of the formula unify E (ϕ).For some constraints, the corresponding BDD is guaranteed to be polynomial (in the size of the constraint).The following result holds for an arbitrary constraint ϕ, so it also holds for unify E (ϕ).
Proposition 3. Let c be a constraint about k integer variables each represented with n bits in the order encoding.Then, the number of nodes in the BDD representing Proof.(Sketch) There are only n + 1 legitimate states for each n bit unary variable, and the BDD cannot have more nodes than possible states.
Constraints like new int, int neq, and int plus involve at most 3 integer variables and hence their BDD-based complete equi-propagators are polynomially bounded.However, this is not the case for global constraints such as allDiff and int array plus where the arity is not fixed.Moreover, it is well known that the allDiff constraint does not have a polynomial sized BDD (Bessiere, Katsirelos, Narodytska, & Walsh, 2009).
. Full (dashed) lines correspond to true (false) edges.Edges to the false node "F" are omitted for brevity.
Given the potential exponential run-time when performing SAT-based equi-propagation, and the potential exponential size of BDD-based equi-propagators, we consider a third approach where we implement equi-propagation by a collection of ad-hoc transition rules for each type of constraint.While this approach is not complete -there are equations implied by a constraint that are not detected -the implementation is fast, and works well in practice.This is the topic of the next section.

Ad-hoc Equi-Propagation
We consider a rule-based approach to define equi-propagators.The definition is given as a set of ad-hoc rules specified for each type of constraint.The novelty is that the approach is not based on CNF, as in previous works, but rather driven by the bit blasted constraints that are to be encoded to CNF.Our presentation focuses on the case where finite domain integers are represented in the order encoding.For an integer X = [x 1 , . . ., x n ], we often write: X ≥ i to denote the equation x i = 1, X < i to denote the equation x i = 0, X = i to denote the equation x i = x i+1 , and X = i to denote the pair of equations x i = 1, x i+1 = 0.Moreover, to simplify notation when specifying the rules below, we view X = [x 1 , . . ., x n ] as a larger vector padded with sentinel cells such that all cells "to the left of" x 1 take value 1 and all cells "to the right of" x n take the value 0. Basically this facilitates the specification of the "end cases" in our formalism.We now consider each of the 5 constraints in the language fragment presented in Section 2.
Figure 6: Ad-hoc rules for (a) new int and (b) int neq Figure 7: Ad-hoc rules for (a) allDiff and (b) int plus (1) The two rules in Figure 6(a) derive from the monotonicity in the order encoding representation.These basically correspond to unit propagation, but at the constraint level.
(2) The first rule in Figure 6(b) considers cases when X is a constant (the symmetric case can be handled by exchanging X and Y ).The other two rules capture templates that commonly arise in the equi-propagation process.To illustrate the justification of the third rule consider all possible truth values for the variables x i and x i+1 : (a) If x i = 0 and x i+1 = 1 then both integers in the relation take the form [. . ., 0, 1, . ..] violating their specification as ordered, so this is not possible.(b) If x i = 1 and x i+1 = 0 then both numbers take the form [1, . . ., 1, 0, . . ., 0] and are equal, violating the neq constraint.The only possible bindings for x i and x i+1 are those where x i = x i+1 .
(3) In Figure 7(a) we illustrate a single rule for the allDiff constraint which considers Hall sets of size 2.Here each Z i represents an integer in the order encoding and we focus on the case when Z 1 and Z 2 are restricted by the equations in E to take only two possible values, i or j.This can be expressed in E because [x 1 , . . ., x n ] ∈ {i, j} (for i < j) means that x k = 1 for k < i, x k = x k+1 for i ≤ k < j, and x k = 0 for j < k ≤ n.Z 1 = Z 2 then means adding the single equation x i = ¬y i (because Z 1 and Z 2 can take only two values).
In addition to this rule, we apply the rules for int neq(Z i , Z j ) for each pair of integers Z i and Z j in the constraint.(5) There are no special ad-hoc rules for equi-propagation of an int array plus constraint.These are simply viewed as a decomposition to a set of int plus constraints.Then simplification is performed at that level using the rules for int plus.The decomposition of int array plus is explained in Section 6.
Example 11 (ad-hoc equi-propagation).Consider the following (partial) constraint model, from the context of the Kakuro example of Section 2.2, where we represent variables X, Y and Z as X = [x 1 , . . ., x 9 ], Y = [y 1 , . . ., y 9 ] and Z = [z 1 , . . ., z 18 ] and assume some previous equi-propagation (on other constraints) has determined the current equi-formula E 0 to specify that integer variable Z = 4: Figure 8 illustrates, step-by-step, the equi-propagation process on C using the ad-hoc rules defined above.Each step corresponds to the application of one of the above defined ad-hoc equi-propagation rules as indicated by the label on the transition.At each stage we illustrate the derived equations (top part) and their application (as a unifier) to the state variables X, Y and Z (lower part).
Figure 9: Simplification rules for new int (crossed out elements have been removed).
To summarize, let us come back to Table 1.The numbers presented in the two columns headed "BEE" specify the number of variables remaining after application of ad-hoc equipropagation.We also observe that our definition of ad-hoc equi-propagation is trivially monotonic.

Constraint Model Partial Evaluation
Partial evaluation, together with equi-propagation, is the second important component in our approach to compile constraint models to CNF.Partial evaluation is about simplifying a given constraint model in view of information that becomes available due to equipropagation.Typically, in the constraint simplification process, we apply alternating steps of equi-propagation and partial evaluation.Examples of partial evaluation include constant elimination and removing constraints which are tautologies.In this section we detail the partial evaluation rules that apply for the five constraint types defined in the language fragment presented in Section 2.
(1) A new int(I, c 1 , c 2 ) constraint specifies that an integer I = [x 1 , . . ., x n ] is represented in the order encoding and in particular that the corresponding bit sequence is sorted (not increasing).We denote this as ordered([x 1 , . . ., x n ]).Partial evaluation focuses on this aspect of the constraint and ignores the bounds c 1 , c 2 specified in the constraint.The table in Figure 9 specifies four simplification rules that apply.The first rule identifies tautologies, the second and third rules remove leading ones and trailing zeros, and the fourth removes (one of two) equated bits.In this figure, and in the subsequent, a crossed out element in a sequence, indicates that it has been removed from the sequence.
(2) The simplification rules for a int neq constraint shown in Figure 10(a) are symmetric when exchanging the role of X and Y .The first two rules identify tautologies.The third rule is about X and Y which have an equal bit at position i.The corresponding bits can be removed from the representation of X and Y , resulting in a shorter list of bits in their representations.The last two rules are about removing leading ones and trailing zeroes and are illustrated by the following example.
[y i , y i+1 , . . ., y n ]) Figure 10: (a) Simplification rules for int neq and (b) an example of their application. where Figure 11: Simplification rules for allDiff (3) Four rules for simplifying allDiff constraints are illustrated in Figure 11.The first, is about detecting tautologies.The second, identifies cases when one of the integers in the constraint (assume Z 1 ) has a domain disjoint from all of the others.This rule also captures the case when Z 1 is a constant.The third rule removes a Hall set of size 2 (assume {Z 1 , Z 2 }) from the constraint.Note that the corresponding equi-propagation rule detects that the values of Z 3 , . . ., Z n are different from the values of {Z 1 , Z 2 } and then the next fourth rule applies.The fourth rule is for the case when none of the integers in the constraint can take a certain value i.This rule also captures the case when all of the numbers have leading ones or trailing zeroes.The last two rules are illustrated in Example 14.
(4 & 5) The simplification rules shown in Figure 12 are symmetric when exchanging the role of X and Y .The first two apply where (at least) one of X, Y and Z is a constant.Because we have already applied equi-propagation to the constraint, it is a tautology.See Example 13.The last two rules apply to remove leading ones and trailing zeroes.The Figure 12: Simplification rules for int plus.To summarise the rule based approach to apply equi-propagation and partial evaluation we present the following sequence of three examples which focus on the simplification of the three constraints given as Figure 13 where the integer variables I 1 , . . ., I 8 are defined in the range between 1 and 8 and where K = 14.
Example 13.Consider equi-propagation of constraint (a) from Figure 13 where E 0 specifies that K = 14: Given E 1 , the constraint is a tautology and removed by partial evaluation: Example 14.Consider equi-propagation of constraint (b) from Figure 13 given E 1 from Example 13: Given E 2 , the equi-propagation rule for allDiff detects that {I 1 , I 2 } is a Hall set (where the two variables take values 6 and 8).and adds to E 2 the set of equations, E , that specify that I 3 , I 4 , I 5 , I 6 , I 7 , I 8 = 6, 8.The result is E 3 = E 2 ∪ E and the result of this step gives the following bindings (where the impact of E is underlined): i 5,2 , i 5,3 , i 5,4 , i 5,5 , i 5,7 , i 5,7 , 0]  I 2 = [1, 1, 1, 1, 1, 1, ¬i 1,7 , ¬i 1,7 ] I 6 = [1, i 6,2 , i 6,3 , i 6,4 , i 6,5 , i 6,7 , i 6,7 , 0] I 3 = [1, i 3,2 , i 3,3 , i 3,4 , i 3,5 , i 3,7 , i 3,7 , 0] I 7 = [1, i 7,2 , i 7,3 , i 7,4 , i 7,5 , i 7,7 , i 7,7 , 0] Given E 3 , partial evaluation of the constraint first removes the Hall set: and then applies to remove three redundant bits in the underlying representation of each remaining integer (which is not equal to 0, 6, 8): Example 15.Consider equi-propagation of constraint (c) from Figure 13 given E 3 from Example 14.The rules that apply derive from the decomposition of the int array plus constraint to it int plus parts.These dictate that I 3 , I 4 , I 5 ≤ 5: Applying partial evaluation simplifies the constraint as follows: To summarize Examples 13-15 observe that in the initial constraint model 3 constraints about 8 integers are represented in 56 bits.After constraint simplification 2 constraints remain and the 8 integers are represented using 28 bits: 6. Compiling Constraints with BEE BEE (Ben-Gurion Equi-propagation Encoder) is a tool which applies to encode finite domain constraint models to CNF.BEE was first introduced by Metodi and Codish (2012).During the encoding process, BEE performs optimizations based on equi-propagation and partial evaluation to improve the quality of the target CNF.BEE is implemented in (SWI) Prolog and can be applied in conjunction with the CryptoMiniSAT solver (Soos, 2010) through a Prolog interface (Codish, Lagoon, & Stuckey, 2008).CryptoMiniSAT offers direct support for xor clauses, and BEE takes advantage of this feature.BEE can be downloaded (Metodi, 2012) where one can also find the examples from this paper and others.
The source language for the BEE compiler is also called BEE.It is a constraint modeling language similar to FlatZinc (Nethercote et al., 2007), but with a focus on a subset of the language relevant for finite domain constraint problems.Five of the constraint constructs in the BEE language are those introduced in Section 2.1.The full language is presented in Table 2.
In BEE Boolean constants "true" and "false" are viewed as (integer) values "1" and "0".Constraints are represented as (a list of) Prolog terms.Boolean and integer variables are represented as Prolog variables, which may be instantiated when simplifying constraints.in Table 2, X and Xs (possibly with subscripts) denote a literal (a Boolean variable or its negation) and a vector of literals, I (possibly with subscript) denotes an integer variable, and c (possibly with subscript) denotes an integer constant.On the right column of the table are brief explanations regarding the constraints.The table introduces 26 constraint templates.
A main design choice of BEE is that all integer variables are represented in the orderencoding.So, BEE is suitable for problems in which the integer variables take small or medium sized values.The compilation of a constraint model to a CNF using BEE goes through three phases.
1. Unary bit-blasting: integer variables (and constants) are represented as bit vectors in the order-encoding.
2. Constraint simplification: three types of actions are applied: equi-propagation, partial evaluation, and decomposition of constraints.Simplification is applied repeatedly until no rule is applicable.
the constraint is replaced by a constraint of the form int eq(A,Sum) which equates the bits of A and Sum, or if As = [A 1 , A 2 ] then it is replaced by int plus(A 1 , A 2 , Sum).In the general case As is split into two halves, then constraints are generated to sum these halves, and then an additional int plus constraint is introduced to sum the two sums.
As another example, consider the int plus(A 1 , A 2 , A) constraint.One approach, supported by BEE, decomposes the constraint as an odd-even merger (from the context of odd-even sorting networks) (Batcher, 1968).Here, the sorted sequences of bits A 1 and A 2 are merged to obtain their sum A. This results in a model with O(n log n) comparator constraints (and later in an encoding with O(n log n) clauses).Another approach, also supported in BEE, does not decompose the constraint but encodes it directly to a CNF of size O(n 2 ), as in the context of so-called totalizers (Bailleux & Boufkhad, 2003).A hybrid approach, leaves the choice to BEE, depending on the size of the domains of the variables involved.Finally, we note that the user can configure BEE to fix the way it compiles this constraint (and others).
CNF encoding is the last phase in the compilation of a constraint model.Each of the remaining simplified (bit-blasted) constraints is encoded directly to a CNF.These encodings are standard and similar to those applied in various tools.The BEE encodings are similar to those applied in Sugar (Tamura et al., 2009).

The All-Different Constraint in BEE
The all-different constraint specifies that a set of integer variables take all different values from their specified domains.This constraint has received much attention in the literature (see for example the survey in van Hoeve, 2001).BEE provides special treatment for this constraint.
In many applications, all-different constraints are applied to model the special case when the constraint is about "permutation".Namely, when [I 1 , . . ., I n ] are all different but may take precisely n different values.BEE identifies this special case and applies two additional ad-hoc equi-propagation rules for this case.The table of Figure 14 illustrates these rules.We annotate the constraint with a "*" to emphasize that it has been detected that it is about permutation.The first rule is about the case when only one integer (assume Z 1 ) can take the value i.The second rule is about the case where all variables except two, assume Z 1 , Z 2 , cannot take two values, assume i, j.Now, because the constraint is about permutation, we can determine that Z 1 and Z 2 must take the two values i and j.To illustrate the second rule consider the following example.
Example 16.Consider a constraint allDiff(I 1 , . . ., I 5 ) on 5 integer variables taking values in the interval [0, 4] (exactly 5 values) where E 0 specifies that I 3 , I 4 and I 5 cannot take the values 0 and 1. Therefore we introduce equations which restrict I 1 and I 2 to take the values 0 and 1, and the corresponding ad-hoc rule for permutation applies: To facilitate the implementation of ad-hoc equi-propagation of all-different constraints, BEE adopts a dual representation for integer variables occurring in these constraints combining the order encoding and the, so-called, direct encoding.This is essentially the same as the encoding proposed by Gent and Nightingale (2004).When declaring an integer variable I, the bit-blast in the order encoding applies the corresponding unification I = [x 1 , . . ., x n ].When encountering I in an allDiff constraint, an additional bit-blast introduces I = [d 0 , . . ., d n ] in the direct encoding, and a channeling formula channel(I, I ) is introduced.
The direct encoding is a unary representation I = [d 0 , . . ., d n ] where each bit d i is true if and only if I = i.So, exactly one of the bits takes the value true.For example, the value 3 in the interval [0, 5] is represented in 6 bits as [0, 0, 0, 1, 0, 0].In the dual representation the following channeling formula captures the relation between the two representations of an integer variable I = [x 1 , . . ., x n ] and I = [d 0 , . . ., Consider an allDiff constraint about m integer variables that can take different values between 0 and n.During constraint simplification, the allDiff([I 1 , . . ., I m ]) constraint is viewed through its direct encoding as a bit matrix where each row consists of the bits [d i0 , . . ., d in ] for I i in the direct encoding.The element d ij is true iff I i takes the value j.The j th column specifies which of the I i take the value j and hence, at most one variable in a column may take the value true.This representation has one main advantage: in the direct encoding we can decompose allDiff([I 1 , . . ., I m ]), to a conjunction of n + 1 constraints, one for each column 0 ≤ j ≤ n, of the form bool array sum leq([d 1j , . . ., d mj ], 1), which is arc-consistent.As soon as d i,j = 1 (I i = j) we have d i,j = 0 (I i = j ) for all j = j.In contrast in the order encoding alone the decomposition to O(m 2 ) constraints int neq(I i , I j ) i < j is not arc-consistent.We illustrate the advantage of the dual encoding for the allDiff constraint in Section 8.1.

Using BEE
A typical BEE application has the form depicted as Figure 15 where the predicate solve/2 takes a problem Instance and provides a Solution.The specifics of the application are in the call to encode/3 which given the Instance generates the Constraints that solve it together with a Map relating instance variables with constraint variables.The calls to bCompile/2 and sat/1 compile the constraints to a CNF and solve it applying a SAT solver.
If the instance has a solution, the SAT solver binds the constraint variables accordingly.
Then, the call to decode/2, using the Map, provides a Solution in terms of the Instance variables.The definitions of encode/3 and decode/2 are application dependent and provided by the user.The predicates bCompile/2 and sat/1 are part of the tool and provide the interface to BEE and the underlying SAT solver.

Example BEE Application: Magic Graph Labeling
We illustrate the application of BEE using Prolog as a modeling language to solve a graph labeling problem.Graph labeling is about finding an assignment of integers to the vertices and edges of a graph subject to certain conditions.Graph labellings were introduced in the 60's and hundreds of papers on a wide variety of related problems have been published since then.See for example the survey by Gallian (2011) with more than 1200 references.Graph labellings have many applications.For instance in radars, X-ray crystallography, coding theory, etc.We focus here on the vertex-magic total labeling (VMTL) problem where one should find for the graph G = (V, E) a labeling that is a one-to-one map V ∪ E → {1, 2, . . ., |V | + |E|} with the property that the sum of the labels of a vertex and its incident edges is a constant K independent of the choice of vertex.A problem instance takes the form vmtl(G, K) specifying the graph G and a constant K.In the context of Figure 15, the query solve(vmtl(G, K), Solution) poses the question: "Does there exist a vmtl labeling for G with magic constant K?" It binds Solution to indicate such a labeling if one exists, or to "unsat" otherwise.Figure 16 illustrates an example problem instance together with its solution.
Figure 17 illustrates a Prolog program that implements the encode/3 predicate for the VMTL problem.The call to predicate declareInts/4 introduces the constraints which declare the integer variables for each vertex and edge in the graph, and generates the map.The call to predicate sumToK/5 introduces the constraints that require the sum of the labels for each vertex with its incident edges to equals K.The auxiliary predicate

An Instance
The Graph A Solution  The Map The Constraints Figure 18: A VMTL instance with the constraints and map generated by encode/3.getVars/3 receives a list of identifiers (vertices and edges) and extracts the corresponding list of integer variables from the map.
Given the VMTL instance from Figure 16, the call to predicate encode/3 from Figure 17 generates the map and the constraints detailed in Figure 18.
In Section 8.3 we report that using BEE enables us to solve interesting instances of the VMTL problem not previously solvable by other techniques.

BumbleBEE
The BEE distribution includes also a command line solver, which we call BumbleBEE.BumbleBEE enables one to specify a BEE model in an input file where each line contains a single constraint from the model and the last line specifies the type of goal.BumbleBEE reads the input file, compiles the constraint model to CNF, solves the CNF using the embedded CryptoMiniSAT solver (Soos, 2010) and outputs a set of bindings to the declared variables in the model (or a message indicating that the constraints are not satisfiable).Figure 19 contains on the left the BumbleBEE input file for the VMTL instance from Figure 16 and on the right the BumbleBEE output, which is a solution for the constraint model.In the example, the last line of the input file specifies the goal to the solver.The options are: 1. solve satisfy: solve for a single satisfying assignment to the constraint model; 2. solve satisfy(c): solve for (at most) c satisfying assignments to the constraint model where c is an integer value.When c ≤ 0 this option will solve for all solutions.
3. solve minimize(I): solve for a solution which minimizes the value of the integer variable I.The solver outputs the intermediate solutions (with decreasing values of I) encountered during the search for the minimum value of I.
Further details and more examples can be found in the BEE distribution (Metodi & Codish, 2012).

Experiments
We report on our experience in applying BEE.To appreciate the ease in its use the reader is encouraged to view the example encodings available with the tool (Metodi & Codish, 2012)  using SWI Prolog v6.0.2 64-bits.Comparisons with Sugar (v1.15.0) are based on the use of identical constraint models, apply the same SAT solver (CryptoMiniSAT v2.5.1), and run on the same machine.Times are reported in seconds.

Quasigroup Completion Problems
A Quasigroup Completion Problem (QCP) proposed by Gomes, Selman, and Crato (1997) as a constraint satisfaction benchmark, is given as an n × n board of integer variables (in the range [1, n]) in which some are assigned integer values.The task is to assign values to all variables, so that no column or row contains the same value twice.The constraint model is a conjunction of allDiff constraints.Ansótegui, del Val, Dotú, Fernández, and Manyà (2004) argue the advantage of the direct encoding for QCP.
We consider 15 instances from the 2008 CSP competition.2Table 3 considers three settings: BEE with its dual encoding for allDiff constraints, BEE using only the order encoding (equivalent to using int neq constraints instead of allDiff), and Sugar.The table shows: the instance identifier ("sat" or "unsat"), compilation time (comp) in seconds, clauses in the encoding (clauses), variables in the encoding (vars), and SAT solving time (SAT) in seconds.
The results indicate that: (1) Application of BEE using the dual representation for allDiff is 38 times faster and produces 20 times fewer clauses (in average) than when using the order-encoding alone (despite the need to maintain two encodings); ( 2 Mancini, Micaletto, Patrizi, and Cadoli (2008) provide a comparison of several stateof-the-art solvers applied to the DNA word problem with a variety of encoding techniques.Their best reported result is a solution with 87 DNA words, obtained in 554 seconds, using an OPL (van Hentenryck, 1999) model with lexicographic order to break symmetry.Frutos, Liu, Thiel, Sanner, Condon, Smith, and Corn (1997) present a strategy to solve this problem where the four letters are modeled by bit-pairs [t, m].Each eight-letter word can then be viewed as the combination of a "t-part", [t 1 , . . ., t 8 ], which is a bit-vector, and a "m-part", [m 1 , . . ., m 8 ], also a bit-vector.The authors report a solution composed from two pairs of (t-part and m-part This forms a set S with (6 × 16) + (2 × 6) = 108 DNA words.Marc van Dongen reports a larger solution with 112 words. 4 Building on the approach described by Frutos et al. (1997), we pose conditions on sets of "t-parts" and "m-parts", T and M , so that their Cartesian product S = T × M will satisfy the requirements of the original problem.From the three conditions below, T is required to satisfy (1 ) and ( 2), and M is required to satisfy (2 ) and (3 ).For a set of bit-vectors V , the conditions are: (1 ) Each bit-vector in V sums to 4; (2 ) Each pair of distinct bit-vectors in V differ in at least 4 positions; and (3 ) For each pair of bit-vectors (not necessarily distinct) u, v ∈ V , u R (the reverse of u) and v C (the complement of v) differ in at least 4 positions.This is equivalent to requiring that (u R ) C differs from v in at least 4 positions.
It is this strategy that we model in our BEE encoding.An instance takes the form dna(n 1 , n 2 ) signifying the numbers of bit-vectors, n 1 and n 2 in the sets T and M .Without loss of generality, we impose, to remove symmetries, that T and M are lexicographically ordered.A solution is the Cartesian product S = T × M .
Using BEE, we find, in a fraction of a second, sets of t-parts of size 14 and m-parts of size 8.This provides a solution of size 14 × 8 = 112 to the DNA word problem.Running Comet (v2.0.1) we find a 112 word solution in about 10 seconds using a model by Håkan Kjellerstrand. 5 Using BEE, we also prove that there does not exist a set of 15 t-parts (0.15 seconds), nor a set of 9 m-parts (4.47 seconds).These facts were unknown prior to BEE.Proving that there is no solution to the DNA word problem with more than 112 words, without the restriction to the two part t-m strategy, is still an open problem., Miller, Slamin, and Wallis (2002) conjecture that the n vertex complete graph, K n , for n ≥ 5 has a vertex magic total labeling with magic constants for a specific range of values of k, determined by n.This conjecture is proved correct for all odd n and verified by brute force for n = 6.We address the cases for n = 8 and n = 10 which involve 15 instances (different values of k) for n = 8, and 23 (different values of k) for n = 10.Starting from the simple constraint model (illustrated by the example in Figure 16), we add additional constraints to exploit the fact that the graphs are symmetric: (1) We assume that the edge with the smallest label is e 1,2 ; (2) We assume that the labels of the edges incident to v 1 are ordered and hence introduce constraints e 1,2 < e 1,3 < • • • < e 1,n ; (3) We assume that the label of edge e 1,3 is smaller than the labels of the edges incident to v 2 (except e 1,2 ) and introduce constraints accordingly.In this setting BEE can solve all except 2 instances with a 4 hour timeout and Sugar can solve all except 4. Table 4 gives results for the 10 hardest instances for K 8 the 20 hardest instances for K 10 with a 4 hour time-out.BEE compilation times are on the order of 0.5 sec/instance for K 8 and 2.5 sec/instance for K 10 .Sugar encoding times are slightly larger.The instances are indicated by the magic constant, k; the columns for BEE and Sugar indicate SAT solving times (in seconds).The bottom two lines indicate average encoding sizes (numbers of clauses and variables).

MacDougall
The results indicate that the Sugar encodings are (in average) about 60% larger, while the average SAT solving time for the BEE encodings is about 2 times faster (average excluding instances where Sugar times-out).
To address the two VMTL instances not solvable using the BEE models described above (K 10 with magic labels 259 and 258), we partition the problem fixing the values of e 1,2 and e 1,3 and maintaining all of the other constraints.Analysis of the symmetry breaking constraints indicates that this results in 198 new instances for each of the two cases.The  original VMTL instance is solved if any one of of these 198 instances is solved.So, we solve them in parallel.Fixing e 1,2 and e 1,3 "fuels" the compiler so the encodings are considerably smaller.The instance for k = 259 is solved in 1379.50 seconds where e 1,2 = 1 and e 1,3 = 6.The compilation time is 2.09 seconds and the encoding consists in just over 1 million clauses and 15 thousand variables.
To the best of our knowledge, the hard instances from this suite are beyond the reach of all previous approaches to program the search for magic labels.The SAT based approach presented by Jäger (2010) cannot handle these. 6The comparison with Sugar indicates the impact of the compiler.

Balanced Incomplete Block Designs
This is Problem 028 of CSPlib (BIBD) where an instance is defined by a 5-tuple of positive integers [v, b, r, k, λ] and requires to partition v distinct objects into b blocks such that each block contains k different objects, exactly r objects occur in each block, and every two distinct objects occur in exactly λ blocks.This model does not contain a sufficient degree of information to trigger the equipropagation process.In order to take advantage of the BEE simplifications we added symmetry breaking as described by Frisch, Jefferson, and Miguel (2004)    Table 6 shows results comparing BEE using the SymB model with the Minion constraint solver (Gent, Jefferson, & Miguel, 2006).We consider three different models for Minion: [M'06] indicates results using the BIBD model described by Gent et al. (2006), SymB uses the same model we use for the SAT approach, SymB + , is an enhanced symmetry breaking model with all of the tricks applied also in the [M'06] model.For the columns with no timeouts we show total times (for BEE this includes compile time and SAT solving).Note that by using a clever modeling of the problem we have improved also the previous run-times for Minion.
This experiment indicates that BEE is significantly faster than Minion on its BIBD models ([M'06]).Only when tailoring our SymB model, does Minion becomes competitive with ours.

Combining BEE with SatELite
We now demonstrate the impact of combining BEE and SatELite.We describe experiments involving two of the benchmarks where SatELite is applied to simplify the output of BEE.The idea is to first apply the more powerful, but local, techniques, performed by BEE.This reduces the size of the CNF and is fast.Then we apply SatELite which takes global considerations on the CNF as a whole.We wish to determine if the smaller, simplified, CNF is more amenable to further simplification using SatELite.The results indicate that although CNF size is slightly decreased, solving times are most often increased, sometimes drastically.
Tables 7 and 8 show our results.In both tables the four columns under the BEE heading indicate: BEE compilation time, size of the encoding (clauses and variables), and the subsequent SAT solving time.Similarly, the four columns under the ∆ SatELite heading indicate the application of SatELite to the output of BEE: the SatELite processing time, the size of the resulting CNF (clauses and variables), and the subsequent SAT solving time.Table 7 illustrates the results for the BIBD benchmark of Section 8.4 and Table 8, the results for the 10 hardest VMTL instances for K 8 and for K 10 described in Section 8.3.Observe that applying SatELite to the output of BEE decreases the CNF size only slightly and does not improve the SAT solving time.In fact, to the contrary, in most cases it renders a CNF which takes more time to solve.In several cases, SAT solving time increases drastically to introduce a timeout.(sec) (sec) (sec) [7,420,180,3,60] 1.65 698579 41399 1.73 1.88 696914 38749 3.41 [7,560,240,3,80] 3.73 1211941 58445 13.60 3.14 1209788 54043 6.97 [12,132,33,3,6] 0.95 180238 31947 0.73 1.20 179700 28351 0.91 [15,45,24,8,12] 0 Our results demonstrate that the application of SatELite to remove redundancies from a CNF is often non-beneficial.Presumably the difference we see from our application of SatELite to other CNF benchmarks results from the fact that BEE produces highly optimized CNF output, while many CNF benchmarks have significant inefficiency in their original encoding.If BEE removes a variable from the CNF, then it also instantiates that variable, either to a constant or to an equivalent variable, and as such does not remove potential propagations from the encoding, as captured by Theorem 9.

Conclusion
There is a considerable body of work on CNF simplification techniques with a clear trade-off between amount of reduction achieved and invested time.Most of these approaches determine binary clauses implied by the CNF, which is certainly enough to determine Boolean equalities.The problem is that determining all binary clauses implied by the CNF is prohibitive when the SAT model may involve many (hundreds of) thousands of variables.Typically only some of the implied binary clauses are determined, such as those visible by unit propagation.The trade-off is regulated by the choice of the techniques applied to infer binary clauses, considering the power and cost.See for example the work of Eén and Biere (2005) and the references therein.There are also approaches (Li, 2003) that detect and use Boolean equalities during run-time, which are complementary to our approach.
In our approach, the beast is tamed by introducing a notion of locality.We do not consider the full CNF.Instead, by maintaining the original representation, a conjunction of constraints, each viewed as a Boolean formula, we can apply powerful reasoning techniques to separate parts of the model and maintain efficient pre-processing.
To this end, we introduce BEE, a compiler that follows this approach to encode finite domain constraints to CNF.Applying optimizations based on ad-hoc equi-propagation and partial evaluation rules on a high level view of the problem allows us to simplify the problem more aggressively than is possible with a CNF representation.The resulting CNF models can be significantly smaller than those resulting from straight translation.
It is well-understood that making a CNF smaller is not the ultimate goal: often smaller CNF's are harder to solve.Indeed, one often introduces redundancies to improve SAT encodings: so removing them is counterproductive.Our experience is that BEE reduces the size of an encoding in a way that is productive for the subsequent SAT solving.In particular, by removing variables that can be determined "at compile time" to be definitely equal (or definitely different) in any solution.
BEE uses ad-hoc equi-propagation and partial evaluation rules which keeps compilation times typically small (measured in seconds) even for instances which result in several millions of CNF clauses.And the reduction in SAT solving time can be larger by orders of magnitude.Hence, we believe that Boolean equi-propagation makes an important contribution to the encoding of CSPs to SAT.
BEE is currently tuned to represent integers in the order encoding.Ongoing work aims to extend BEE for binary and additional number representations such as mixed radix bases as considered by Eén and Sörensson (2006) and further by Codish, Fekete, Fuhs, and Schneider-Kamp (2011).

Proof.
Let ϕ be a Boolean formula, E an equi-formula, and let C ϕ and C E be any clausal representations of ϕ and of E respectively.Clearly ϕ |= C ϕ and E |= C E .Let b be a positive literal determined by unit propagation of C ϕ ∪ C E .Then by correctness of unit propagation, C ϕ ∪ C E |= b.Hence, ϕ ∧ E |= b and thus μϕ (E) |= b = 1.The case for a negative literal ¬b is the same, except that we infer b = 0.

Figure 8 :
Figure 8: Ad-hoc equi-propagation described in Example 11 Figure 10(b)  shows two steps of partial evaluation, for a int neq constraint, first removing leading ones, then removing trailing zeroes.

Figure 15 :
Figure 15: A generic application of BEE.

Figure 16 :
Figure 16: A VMTL instance with a solution.

Figure 17 :
Figure 17: encode/3 predicate for the VMTL application of BEE

8. 2
Word Design for DNA This is Problem 033 of CSPLib which seeks the largest parameter n, such that there exists a set S of n eight-letter words over the alphabet Σ = {A, C, G, T } with the following properties: (1) Each word in S has exactly 4 symbols from {C, G}; (2) Each pair of distinct words in S differ in at least 4 positions; and (3) For every x, y ∈ S: x R (the reverse of x) and y C (the word obtained by replacing each A by T , each C by G, and vice versa) differ in at least 4 positions.
and illustrated in Figure 20: Each row is viewed as sequence of four parts A . . .D with sizes λ, (r − λ), (r − λ), and (b − 2r + λ).The first row is fixed by assigning parts A and B with ones (marked in black) and parts C and D with zeros (marked in white).The second row is fixed by assign parts A and C with ones (marked in black) and parts B and D with zeros (marked in white).For the third and all subsequent rows (marked in gray), the sum constraints are decomposed into summing each part (A . . .D) and then summing the results as follows: A + B = λ, A + C = λ, C + D = r − λ, and B + D = r − λ.This ensures that the row contains exactly r ones and that the scalar product with the first (and second) row is λ.We denote this constraint model SymB (for symmetry breaking).
is satisfiable if and only if unify E (ϕ) is satisfiable; and c. if σ is a satisfying assignment for unify E (ϕ) then σ•unify E is a satisfying assignment for ϕ ∧ E. Assume σ is a satisfying assignment of unify E (ϕ).Clearly σ • unify E satisfies ϕ by construction.Also σ • unify E satisfies E since unify E (E) is trivial.Hence σ • unify E is a satisfying assignment of ϕ ∧ E.

Table 1 :
The first two columns in the table indicate the instance category and ID.From the five columns headed "Integer Applying SAT-based complete equi-propagation on Kakuro encoding . All experiments run on an Intel Core 2 Duo E8400 3.00GHz CPU with 4GB memory under Linux (Ubuntu lucid, kernel 2.6.32-24-generic).BEE is written in Prolog and run ) Without the dual representation, solving encodings generated by BEE is only slightly faster than Sugar but BEE still generates CNF encodings 4 times smaller (on average) than those generated by Sugar.Observe that 3 instances are found unsatisfiable by BEE (indicated

Table 3 :
QCP results for 25 × 25 instances with 264 holes by a CNF with a single clause and no variables).We comment that Sugar pre-processing times are higher than those of BEE and not indicated in the table.

Table 5
(Eén & Biere, 2005)ring BEE (compilation time, clauses in encoding, and SAT solving time) with Sugar using the SymB model.We also compare BEE with SatELite(Eén & Biere, 2005), a CNF minimizer, where the input to SatELite is the CNF encoding for the SymB model generated by BEE without applying any simplifications.Here compilation time (comp) indicates the SatELite pre-processing time.The final row indicates the total of compilation and SAT solving time over the entire suite for each approach.In all cases time is measured in seconds.This experiment indicates that BEE generates a significantly smaller CNF than Sugar which affects the SAT solving time.Moreover, the Sugar compilation time is extremely long.When comparing BEE with SatELite we can see that both output a CNF which is similar in size but as SatELite is applied on the entire CNF, for some instances its compilation time is significantly longer than its solving time.