Relaxed Survey Propagation for The Weighted Maximum Satisfiability Problem

The survey propagation (SP) algorithm has been shown to work well on large instances of the random 3-SAT problem near its phase transition. It was shown that SP estimates marginals over covers that represent clusters of solutions. The SP-y algorithm generalizes SP to work on the maximum satisfiability (Max-SAT) problem, but the cover interpretation of SP does not generalize to SP-y. In this paper, we formulate the relaxed survey propagation (RSP) algorithm, which extends the SP algorithm to apply to the weighted Max-SAT problem. We show that RSP has an interpretation of estimating marginals over covers violating a set of clauses with minimal weight. This naturally generalizes the cover interpretation of SP. Empirically, we show that RSP outperforms SP-y and other state-of-the-art Max-SAT solvers on random Max-SAT instances. RSP also outperforms state-of-the-art weighted Max-SAT solvers on random weighted Max-SAT instances.


Introduction
The 3-SAT problem is the archetypical NP-complete problem, and the difficulty of solving random 3-SAT instances has been shown to be related to the clause to variable ratio, α = M/N , where M is the number of clauses and N the number of variables.A phase transition occurs at the critical value of α c ≈ 4.267: random 3-SAT instances with α < α c are generally satisfiable, while instances with α > α c are not.Instances close to the phase transition are generally hard to solve using local search algorithms (Mezard & Zecchina, 2002;Braunstein, Mezard, & Zecchina, 2005).
The survey propagation (SP) algorithm was invented in the statistical physics community using approaches used for analyzing phase transitions in spin glasses (Mezard & Zecchina, 2002).The SP algorithm has surprised computer scientists by its ability to solve efficiently extremely large and difficult Boolean satisfiability (SAT) instances in the phase transition region.The algorithm has also been extended to the SP-y algorithm to handle the maximum satisfiability (Max-SAT) problem (Battaglia, Kolar, & Zecchina, 2004).
Progress has been made in understanding why the SP algorithm works well.Braunstein and Zecchina (2004) first showed that SP can be viewed as the belief propagation (BP) algorithm (Pearl, 1988) on a related factor graph where only clusters of solutions represented by covers have non-zero probability.It is not known whether a similar interpretation can be given to the SP-y algorithm.In this paper, we extend the SP algorithm to handle weighted c 2009 AI Access Foundation.All rights reserved.
Max-SAT instances in a way that preserves the cover interpretation, and we call this new algorithm the Relaxed Survey Propagation (RSP) algorithm.Empirically, we show that RSP outperforms SP-y and other state-of-the-art solvers on random Max-SAT instances.It also outperforms state-of-the-art solvers on a few benchmark Max-SAT instances.On random weighted Max-SAT instances, it outperforms state-of-the-art weighted Max-SAT solvers.
The rest of this paper is organized as follows.In Section 2, we describe the background literature and mathematical notations necessary for understanding this paper.This includes a brief review of the definition of joint probability distributions over factor graphs, an introduction to the SAT, Max-SAT and the weighted Max-SAT problem, and how they can be formulated as inference problems over a probability distribution on a factor graph.In Section 3, we give a review of the BP algorithm (Pearl, 1988), which plays a central role in this paper.In Section 4, we give a description of the SP (Braunstein et al., 2005) and the SP-y (Battaglia et al., 2004) algorithm, explaining them as warning propagation algorithms.In Section 5, we define a joint distribution over an extended factor graph given a weighted Max-SAT instance.This factor graph generalizes the factor graph defined by Maneva, Mossel and Wainwright (2004) and by Chieu and Lee (2008).We show that, for solving SAT instances, running the BP algorithm on this factor graph is equivalent to running the SP algorithm derived by Braunstein, Mezard and Zecchina (2005).For the weighted Max-SAT problem, this gives rise to a new algorithm that we call the Relaxed Survey Propagation (RSP) algorithm.In Section 7, we show empirically that RSP outperforms other algorithms for solving hard Max-SAT and weighted Max-SAT instances.

Background
While SP was first derived from principles in statistical physics, it can be understood as a BP algorithm, estimating marginals for a joint distribution defined over a factor graph.In this section, we will provide background material on joint distributions defined over factor graphs.We will then define the Boolean satisfiability (SAT) problem, the maximum satisfiability (Max-SAT) problem, and the weighted maximum satisfiability (weighted Max-SAT) problem, and show that these problems can be solved by solving an inference problem over joint distributions defined on factor graphs.A review of the definition and derivation of the BP algorithm will then follow in the next section, before we describe the SP algorithm in Section 4.

Notations
First, we will define notations and concepts that are relevant to the inference problems over factor graphs.Factor graphs provide a framework for reasoning and manipulating the joint distribution over a set of variables.In general, variables could be continuous in nature, but in this paper, we limit ourselves to discrete random variables.
In this paper, we denote random variables using large Roman letters, e.g., X, Y .The random variables are always discrete in this paper, taking values in a finite domain.Usually, we are interested in vectors of random variables, for which we will write the letters in bold face, e.g., X, Y.We will often index random variables by the letters i, j, k..., and write, for example, X = {X i } i∈V , where V is a finite set.For a subset W ⊆ V , we will denote Figure 1: A simple factor graph for p(x) = ψ β (x 1 , x 2 )ψ β (x 1 , x 3 )ψ β (x 2 , x 4 ).
X W = {X i } i∈W .We call an assignment of values to the variables in X a configuration, and will denote it in small bold letters, e.g.x.We will often write x to represent an event X = x and, for a probability distribution p, write p(x) to mean p(X = x).Similarly, we will write x to denote the event X = x, and write p(x) to denote p(X = x).
A recurring theme in this paper will be on defining message passing algorithms for joint distributions on factor graphs (Kschischang, Frey, & Loeliger, 2001).In a joint distribution defined as a product of local functions (functions defined on a small subset of variables), we will refer to the local functions as factors.We will index factors, e.g.ψ β , with Greek letters, e.g., β, γ (avoiding α which is used as the symbol for clause to variable ratio in SAT instances).For each factor ψ β , we denote V (β) ⊆ V as the subset of variables on which ψ β is defined, i.e. ψ β is a function defined on the variables X V (β) .In message passing algorithms, messages are vectors of real numbers that are sent from factors to variables or vice versa.A vector message sent from a variable X i to a factor ψ β will be denoted as M i→β , and a message from ψ β to X i will be denoted as M β→i .

Joint Distributions and Factor Graphs
Given a large set of discrete, random variables X = {X i } i∈V , we are interested in the joint probability distribution p(X) over these variables.When the set V is large, it is often of interest to assume a simple decomposition, so that we can draw conclusions efficiently from the distribution.In this paper, we are interested in the joint probability distribution that can be decomposed as follows where the set F indexes a set of functions {ψ β } β∈F .Each function ψ β is defined on a subset of variables X V (β) of the set X, and maps configurations x V (β) into non-negative real numbers.Assuming that each function ψ β is defined on a small subset of variables X V (β) , we hope to do efficient inference with this distribution, despite the large number of variables in X.The constant Z is a normalization constant, which ensures that the distribution sums to one over all configurations x of X.
A factor graph (Kschischang et al., 2001) provides a useful graphical representation illustrating the dependencies defined in the joint probability distribution in Equation 1.A factor graph G = ({V, F }, E), is a bipartite graph with two sets of nodes, the set of variable nodes, V , and the set of factor nodes, F .The set of edges E in the factor graph connects variable nodes to factor nodes, hence the bipartite nature of the graph.For a factor graph representing the joint distribution in Equation 1, an edge e = (β, i) is in E if and only if the variable X i is a parameter of the factor ψ β (i.e.i ∈ V (β)).We will denote V (i) as the set of factors depending on the variable X i , i.e.
We show a simple example of a factor graph in Figure 1.In this small example, we have for example, V (β) = {1, 2} and V (2) = {β, β }.The factor graph representation is useful for illustrating inference algorithms on joint distributions in the form of Equation 1 (Kschischang et al., 2001).In Section 3, we will describe the BP algorithm by using the factor graph representation.Equation 1 defines the joint distribution as a product of local factors.It is often useful to represent the distribution in the following exponential form: The above equation is a reparameterization of Equation 1, with and Φ = ln Z.In statistical physics, the exponential form is often written as follows: where E(x) is the Hamiltonian or energy function, k B is the Boltzmann's constant, and T is the temperature.For simplicity, we set k B T = 1, and Equations 3 and 4 are equivalent with E(x) = − β∈F φ β (x V (β) ).Bayesian (belief) networks and Markov random fields are two other graphical representations often used to describe multi-dimensional probability distributions.Factor graphs are closely related to both Bayesian networks and Markov random fields, and algorithms operating on factor graphs are often directly applicable to Bayesian networks and Markov random fields.We refer the reader to the work of Kschischang et al. (2001) for a comparison between factor graphs, Bayesian networks and Markov random fields.

Inference on Joint Distributions
In the literature, "inference" on a joint distribution can refer to solving one of two problems.We define the two problems as follows: Problem 1 (MAP problem).Given a joint distribution, p(x), we are interested in the configuration(s) with the highest probability.Such configurations, x * , are called the maximuma-posteriori configurations, or MAP configurations From the joint distribution in Equation 4, the MAP configuration minimizes the energy function E(x), and hence the MAP problem is sometimes called the energy minimization problem.
Problem 2 (Marginal problem).Given a joint distribution, p(x), of central interest are the calculation or estimation of probabilities of events involving a single variable X i = x i .We refer to such probabilities as marginal probabilities: The notation x\x i means summing over all configurations of X with the variable X i set to x i .Marginals are important as they represent the underlying distribution of individual variables.
In general, both problems are not solvable in reasonable time by currently known methods.Naive calculation of p i (x i ) involves summing the probabilities of all configurations for the variables X for which X i = x i .For example, in a factor graph with n variables of cardinality q, finding the marginal of one of the variables will involve summing over q n−1 configurations.Furthermore, NP-complete problems such as 3-SAT can be simply coded as factor graphs (see Section 2.4.1).As such, the MAP problem is in general NP-complete, while the marginal problem is equivalent to model counting for 3-SAT, and is #P-complete (Cooper, 1990).Hence, in general, we do not expect to solve the inference problems (exactly) in reasonable time, unless the problems are very small, or have special structures that can be exploited for efficient inference.
Of central interest in this paper is a particular approximate inference method known as the (sum-product) belief propagation (BP) algorithm.We defer the discussion of the BP algorithm to the next section.In the rest of this section, we will describe the SAT, Max-SAT and weighted Max-SAT problems, and how they can be simply formulated as inference problems on a joint distribution over a factor graph.

The SAT and Max-SAT Problem
A variable is Boolean if it takes values in {FALSE, TRUE}.In this paper, we will follow conventions in statistical physics, where Boolean variables take values in {−1, +1}, with −1 corresponding to FALSE, and +1 corresponding to TRUE.
The Boolean satisfiability (SAT) problem is given as a Boolean propositional formula written with the operators AND (conjunction), OR (disjunction), NOT (negation), and parenthesis.The objective of the SAT problem is to decide whether there exists a configuration such that the propositional formula is satisfied (evaluates to TRUE).The SAT problem is the first problem shown to be NP-complete in Stephen Cook's seminal paper in 1971 (Cook, 1971;Levin, 1973).
The three operators in Boolean algebra are defined as follows: given two propositional formulas A and B, OR(A, B) is true if either A or B is true; AND(A, B) is true only if both A and B are true; and NOT(A) is true if A is false.In the rest of the paper, we will use the standard notations in Boolean algebra for the Boolean operators: A ∨ B means OR(A, B), A ∧ B means AND(A, B), and A means NOT(A).The parenthesis is available to allow nested application of the operators, e.g.(A ∨ B) ∧ (B ∨ C).
The conjunctive normal form (CNF) is often used as a standard form for writing Boolean formulas.The CNF consists of a conjunction of disjunctions of literals, where a literal is either a variable or its negation.For example, (X 1 ∨ X 2 ) ∧ (X 3 ∨ X 4 ) is in CNF, while X 1 ∨ X 2 and (X 1 ∧ X 2 ) ∨ (X 2 ∧ X 3 ) are not.Any Boolean formula can be re-written in CNF using De Morgan's law and the distributivity law, although in practice, this may lead to an exponential blowup in the size of the formula, and the Tseitin transformation is often used instead (Tseitin, 1968).In CNF, a Boolean formula can be considered to be the conjunction of a set of clauses, where each clause is a disjunction of literals.Hence, a SAT problem is often given as (X, C), where X is the vector of the Boolean variables, and C is a set of clauses.Each clause in C is satisfied by a configuration if it evaluates to TRUE for that configuration.Otherwise, it is said to be violated by the configuration.We will use Greek letters (e.g.β, γ) as indices for clauses in C, and denote by V (β) as the set of variables in the clause β ∈ C. The K-SAT problem is a SAT problem for which each clause in C consists of exactly K literals.The K-SAT problem is NP-complete, for K ≥ 3 (Cook, 1971).
The maximum satisfiability problem (Max-SAT) problem is the optimization version of the SAT problem, where the aim is to minimize the number of violated constraints in the formula.We define a simple working example of the Max-SAT problem that we will use throughout the paper: Example 1. Define an instance of the Max-SAT problem in CNF with the following clauses The Boolean expression representing this problem would be The objective of the Max-SAT problem would be to find a configuration minimizing the number of violated clauses.

Factor Graph Representation for SAT Instances
The SAT problem in CNF can easily be represented as a joint distribution over a factor graph.In the following definition, we give a possible definition of a joint distribution over Boolean configurations for a given SAT instance, where the Boolean variables take values in {−1, +1}.
Definition 1.Given an instance of the SAT problem, (X, C) in conjunctive normal form, where X is a vector of N Boolean variables.We define the energy, E(x), and the distribution, p(x), over configurations of the SAT instance (Battaglia et al., 2004) where x ∈ {−1, +1} N , and J β,i takes values in {−1, +1}.If J β,i = +1 (resp.−1), then β contains X i as a negative (resp.positive) literal.Each clause β is satisfied if one of its variables X i takes the value −J β,i .When a clause β is satisfied, With the above definition, the energy function is zero for satisfying configurations, and equals the number of violated clauses for non-satisfying configuration.Hence, satisfying configurations of the SAT instance are the MAP configurations of the factor graph.
In this section, we make some definitions that will be useful in the rest of the paper.For a clause β containing a variable X i (associated with the value of J β,i ), we will say that X i satisfies β if X i = −J β,i .In this case, the clause β is satisfied regardless of the values taken by the other variables.Conversely, we say that X i violates β if X i does not satisfy β.In this case, it is still possible that β is satisfied by other variables.
Definition 2. For a clause β ∈ C, we define u β,i (resp.s β,i ) as the value of X i ∈ {−1, +1} that violates (resp.satisfies) clause β.This means that s β,i = −J β,i and u β,i = +J β,i .We define the following sets In the above definitions, V + (i) (resp.V − (i)) is the set of clauses that contain X i as a positive literal (resp.negative literal).V s β (i) (resp.V u β (i)) is the set of clauses containing X i that agrees (resp.disagrees) with the clause β concerning X i .These sets will be useful when we define the SP message passing algorithms for SAT instances.

Related Work on SAT
The SAT problem is well studied in computer science: as the archetypical NP-complete problem, it is common to reformulate other NP-complete problems such as graph coloring as a SAT problem (Prestwich, 2003).SAT solvers are either complete or incomplete.The best known complete solver for solving the SAT problem is probably the Davis-Putnam-Logemann-Loveland (DPLL) algorithm (Davis & Putnam, 1960;Davis, Logemann, & Loveland, 1962).The DPLL algorithm is a basic backtracking algorithm that runs by choosing a literal, assigning a truth value to it, simplifying the formula and then recursively checking if the simplified formula is satisfiable; if this is the case, the original formula is satisfiable; otherwise, the same recursive check is done assuming the opposite truth value.Variants of the DPLL algorithm such as Chaff (Moskewicz & Madigan, 2001), MiniSat (Een & Sörensson, 2005), and RSAT (Pipatsrisawat & Darwiche, 2007) are among the best performers in recent SAT competitions (Berre & Simon, 2003, 2005).Solvers such as satz (Li & Anbulagan, 1997) and cnfs (Dubois & Dequen, 2001) have also been making progress in solving hard random 3-SAT instances.
Most solvers that participated in recent SAT competitions are complete solvers.While incomplete or stochastic solvers do not show that a SAT instance is unsatisfiable, they are often able to solve larger satisfiable instances than complete solvers.Incomplete solvers usually start with a randomly initialized configuration, and different algorithms differ in the way they flip selected variables to move towards a solution.One disadvantage of such an approach is that in hard SAT instances, a large number of variables have to be flipped to move a current configuration out of a local minimum, which acts as a local trap.Incomplete solvers differ in the strategies used to move the configuration out of such traps.For example, simulated annealing (Kirkpatrick, Jr., & Vecchi, 1983) allows the search to move uphill, controlled by a temperature parameter.GSAT (Selman, Levesque, & Mitchell, 1992) and WalkSAT (Selman, Kautz, & Cohen, 1994) are two algorithms developed in the 1990s that allow randomized moves when the solution cannot be improved locally.The two algorithms differ in the way they choose the variables to flip.GSAT makes the change which minimizes the number of unsatisfied clauses in the new configuration, while WalkSAT selects the variable that, when flipped, results in no previously satisfied clauses becoming unsatisfied.Variants of algorithms such as WalkSAT and GSAT use various strategies, such as tabusearch (McAllester, Selman, & Kautz, 1997) or adapting the noise parameter that is used, to help the search out of a local minima (Hoos, 2002).Another class of approaches is based on applying discrete Lagrangian methods on SAT as a constrained optimization problem (Shang & Wah, 1998).The Lagrange mutlipliers are used as a force to lead the search out of local traps.
The SP algorithm (Braunstein et al., 2005) has been shown to beat the best incomplete solvers in solving hard random 3-SAT instances efficiently.SP estimates marginals on all variables and chooses a few of them to fix to a truth value.The size of the instance is then reduced by removing these variables, and SP is run again on the remaining instance.This iterative process is called decimation in the SP literature.It was shown empirically that SP rarely makes any mistakes in its decimation, and SP solves very large 3-SAT instances that are very hard for local search algorithms.Recently, Braunstein and Zecchina (2006) have shown that by modifying BP and SP updates with a reinforcement term, the effectiveness of these algorithms as solvers can be further improved.

The Weighted Max-SAT Problem
The weighted Max-SAT problem is a generalization of the Max-SAT problem, where each clause is assigned a weight.We define an instance of the weighted Max-SAT problem as follows: Definition 3. A weighted Max-SAT instance (X, C, W) in CNF consists of X, a vector of N variables taking values in {−1, +1}, C, a set of clauses, and W, the set of weights for each clause in C. We define the energy of the weighted Max-SAT problem as where x ∈ {−1, +1} N , and J β,i takes values in {−1, +1}, and w β is the weight of the clause β.The total energy, E(x), of a configuration x equals the total weight of violated clauses.
Similarly to SAT, there are also complete and incomplete solvers for the weighted Max-SAT problem.Complete weighted Max-SAT solvers involve branch and bound techniques by calculating bounds on the cost function.Larrosa and Heras (2005) introduced a framework that integrated the branch and bound techniques into a Max-DPLL algorithm for solving the Max-SAT problem.Incomplete solvers generally employ heuristics that are similar to those used for SAT problems.An example of an incomplete method is the min-conflicts hillclimbing with random walks algorithm (Minton, Philips, Johnston, & Laird, 1992).Many SAT solvers such as WalkSAT can be extended to solve weighted Max-SAT problems, where the weights are used as a criterion in the selection of variables to flip.
As a working example in this paper, we define the following instance of a weighted Max-SAT problem: Example 2. We define a set of weighted Max-SAT clauses in the following table: This weighted Max-SAT example has the same variables and clauses as the Max-SAT example given in Example 1.In the above table, we show the clauses satisfied (a tick) or violated (a cross) by each of the 8 possible configurations of the 3 variables.In the first row, the symbol − corresponds to the value −1, and + corresponds to +1.For example, the string " − − + " corresponds to the configuration (X 1 , X 2 , X 3 ) = (−1, −1, +1).The last row of the table shows the energy of the configuration in each column.
The factor graph for this weighted Max-SAT example is the same as the one for the Max-SAT example in Example 1.The differences between the two examples are in the clause weights, which are reflected in the joint distribution, but not in the factor graph.The energy for this example is as follows:

Phase Transitions
The SP algorithm has been shown to work well on 3-SAT instances near its phase transition, where instances are known to be very hard to solve.The term "phase transition" arises from the physics community.To understand the notion of "hardness" in optimization problems, computer scientists and physicists have been studying the relationship between computational complexity in computer science and phase transitions in statistical physics.
In statistical physics, the phenomenon of phase transitions refers to the abrupt changes in one or more physical properties in thermodynamic or magnetic systems with a small change in the value of a variable such as the temperature.In computer science, it has been observed that in random ensembles of instances such as K-SAT, there is a sharp threshold where randomly generated problems undergo an abrupt change in properties.For example, in K-SAT, it has been observed empirically that as the clause to variable ratio α changes, randomly generated instances change abruptly from satisfiable to unsatisfiable at a particular value of α, often denoted as α c .Moreover, instances generated with a value of α close to α c are found to be extremely hard to solve.
Computer scientists and physicists have worked on bounding and calculating the precise value of α c where the phase transition for 3-SAT occurs.Using the cavity approach, physicists claim that α c ≈ 4.267 (Mezard & Zecchina, 2002).While their derivation of the value of α c is non-rigorous, it is based on this derivation that they formulated the SP algorithm.Using rigorous mathematical approaches, the upper bounds to the value of α c can be derived using first-order methods.For example, in the work of Kirousis, Kranakis, Krizanc, and Stamatiou (1998), α c for 3-SAT was upper bounded by 4.571.Achlioptas, Naor and Peres (2005) lower-bounded the value of α c using a weighted second moments method, and their lower bound is close to the upper bounds for K-SAT ensembles for large values of K.However, their lower bound for 3-SAT is 2.68, rather far from the conjectured value of 4.267.A better (algorithmic) lower bound of 3.52 can be obtained by analyzing the behavior of algorithms that find SAT configurations (Kaporis, Kirousis, & Lalas, 2006).
Physicists have also shown rigorously using second moment methods that as α approaches α c , the search space fractures dramatically, with many small solution clusters appearing relatively far apart from each other (Mezard, Mora, & Zecchina, 2005).Clusters of solutions are generally defined as a set of connected components of the solution space, where two adjacent solutions have a Hamming distance of 1 (differ by one variable).Daude,  Mezard, Mora, and Zecchina (2008) redefined the notion of clusters by using the concept of x-satisfiability: a SAT instance is x-satisfiable if there exists two solutions differing by N x variables, where N is the total number of variables.They showed that near the phase transition, x goes from around 1 2 to very small values, without going through a phase of intermediate values.This clustering phenomenon explains why instances generated with α close to α c are extremely hard to solve with local search algorithm: it is difficult for the local search algorithm to move from a local minimum to the global minimum.

The Belief Propagation Algorithm
The BP algorithm has been reinvented in different fields under different names.For example, in the speech recognition community, the BP algorithm is known as the forward-backward procedure (Rabiner & Juang, 1993).On tree-structured factor graphs, the BP algorithm is simply a dynamic programming approach applied to the tree structure, and it can be shown that BP calculates the marginals for each variable in the factor graph (i.e.solving Problem 2).In loopy factor graphs, the BP algorithm has been found to provide a reasonable approximation to solving the marginal problem when the algorithm converges.In this case, the BP algorithm is often called the loopy BP algorithm.Yedidia, Freeman and Weiss (2005) have shown that the fixed points of the loopy BP algorithm correspond to the stationary points of the Bethe free energy, and is hence a sensible approximate method for estimaing marginals.
In this section, we will first describe the BP algorithm as a dynamic programming method for solving the marginal problem (Problem 2) for tree-structured factor graphs.We will also briefly describe how the BP algorithm can be applied to factor graphs with loops, and refer the reader to the work of Yedidia et al. (2005) for the underlying theoretical justification in this case.
Given a factor graph representing a distribution p(x), the BP algorithm involves iteratively passing messages from factor nodes β ∈ F to variable nodes i ∈ V , and vice versa.Each factor node β represents a factor ψ β , which is a factor in the joint distribution given in Equation 1.In Figure 3, we give an illustration of how the messages are passed between factor nodes and variable nodes.Each Greek alphabet (e.g.β ∈ F ) in a square represents a factor (e.g.ψ β ) and each Roman alphabet (e.g.i ∈ V ) in a circle represents a variable (e.g.X i ).
The factor to variable messages (e.g.M β→i ), and the variable to factor messages (e.g.M i→β ) are vectors of real numbers, with length equal to the cardinality of the variable X i .

Chieu & Lee
We denote by M β→i (x i ) or M i→β (x i ) the component of the vector corresponding to the value X i = x i .
The message update equations are as follows: where x V (β) \x i means summing over all configurations X V (β) with X i set to x i .For a tree-structured factor graph, the message updates can be scheduled such that after two parses over the tree structure, the messages will converge.Once the messages converge, the beliefs at each variable node are calculated as follows: For a tree-structured graph, the normalized beliefs for each variable will be equal to its marginals.
INPUT: A joint distribution p(x) defined over a tree-structured factor graph ({V, F }, E), and a variable X i ∈ X.
OUTPUT: Exact marginals for the variable X i .
ALGORITHM : 1. Organize the tree so that X i is the root of the tree.
2. Start from the leaves, propagate the messages from child nodes to parent nodes right up to the root X i with Equations 15 and 16.
3. The marginals of X i can then be obtained as the normalized beliefs in Equation 17.
Figure 4: The BP algorithm for calculating the marginal of a single variable, X i , on a tree-structured factor graph The algorithm for calculating the exact marginals of a given variable X i , is given in Figure 4.This algorithm is simply a dynamic programming procedure for calculating the marginals, p i (X i ), by organizing the sums so that the sums at the leaves are done first.For the simple example in Figure 1, for calculating p 1 (x 1 ), the sum can be ordered as follows: The BP algorithm simply carries out this sum by using the node for X 1 as the root of the tree-structured factor graph in Figure 1.
The BP algorithm can also be used for calculating marginals for all variables efficiently, with the message passing schedule given in Figure 5.This schedule involves selecting a random variable node as the root of the tree, and then passing the messages from the leaves to the root, and back down to the leaves, After the two parses, all the message updates required in the algorithm in Figure 4 for any one variable would have been performed, and the beliefs of all the variables can be calculated from the messages.The normalized beliefs for each variable will be equal to the marginals for the variable.
INPUT: A joint distribution p(x) defined over a tree-structured factor graph (V, F ).
OUTPUT: Exact marginals for all variables in V .

ALGORITHM :
1. Randomly select a variable as a root.
2. Upward pass: starting from leaves, propagate messages from the leaves right up to the tree.
3. Downward pass: from the root, propagate messages back down to the leaves.
4. Calculate the beliefs of all variables as given in Equation 17.
Figure 5: The BP algorithm for calculating the marginals of all variables on a treestructured factor graph If the factor graph is not tree-structured (i.e.contains loops), then the message updates cannot be scheduled in the simple way described in the algorithm in Figure 5.In this case, we can still apply BP by iteratively updating the messages with Equations 15 and 16, often in a round-robin manner over all factor-variable pairs.This is done until all the messages converge (i.e. the messages do not change over iterations).There is no guarantee that all the messages will converge for general factor graphs.However, if they do converge, it was observed that the beliefs calculated with Equation 17 are often a good approximation of the exact beliefs of the joint distribution (Murphy, Weiss, & Jordan, 1999).When applied in this manner, the BP algorithm is often called the loopy BP algorithm.Recently, Yedidia, Freeman andWeiss (2001, 2005) have shown that loopy BP has an underlying variational principle.They showed that the fixed points of the BP algorithm correspond to the stationary points of the Bethe free energy.This fact serves in some sense to justify the BP algorithm even when the factor graph it operates on has loops, because minimizing the Bethe free energy is a sensible approximation procedure for solving the marginal problem.We refer the reader to the work of Yedidia et al. (2005) for more details.

Survey Propagation: The SP and SP-y Algorithms
Recently, the SP algorithm (Braunstein et al., 2005) has been shown to beat the best incomplete solvers in solving hard 3-SAT instances efficiently.The SP algorithm was first derived from principles in statistical physics, and can be explained using the cavity approach (Mezard & Parisi, 2003).It was first given a BP interpretation in the work of Braunstein and Zecchina (2004).In this section, we will define the SP and the SP-y algorithms for solving SAT and Max-SAT problems, using a warning propagation interpretation for these algorithms.

SP Algorithm for The SAT Problem
In Section 2.4.1, we have defined a joint distribution for the SAT problem (X, C), where the energy function of a configuration is equal to the number of violated clauses for the configuration.In the factor graph ({V, F }, E) representing this joint distribution, the variable nodes in V correspond to the Boolean variables in X, and each factor node in F represents a clause in C. In this section, we provide an intuitive overview of the SP algorithm as it was formulated in the work of Braunstein et al. (2005).
The SP algorithm can be defined as a message passing algorithm on the factor graph ({V, F }, E).Each factor β ∈ F passes a single real number, η β→i to a neighboring variable X i in the factor graph.This real number η β→i is called a survey.According to the warning propagation interpretation given in the work of Braunstein et al. (2005), the survey η β→i corresponds to the probability1 of the warning that the factor β is sending to the variable X i .Intuitively, if η β→i is close to 1, then the factor β is warning the variable X i against taking a value that will violate the clause β.If η β→i is close to 0, then the factor β is indifferent over the value taken by X i , and this is because the clause β is satisfied by other variables in V (β).
We first define the messages sent from a variable X j to a neighboring factor β, as a function of the inputs from other factors containing X j , i.e. {η β →j } β ∈V (j)\β .In SP, this message is a vector of three numbers, Π u j→β , Π s j→β , and Π 0 j→β , with the following interpretations: Π u j→β is the probability that X j is warned (by other clauses) to take a value that will violate the clause β.Π s j→β is the probability that X j is warned (by other clauses) to take a value that will satisfy the clause β.Π 0 j→β is the probability that X j is free to take any value.With these defintions, the update equations are as follows: These equations are defined using the sets of factors V u β (j) and V s β (j), which has been defined in Section 2.4.1.For the event where the variable X j is warned to take on a value violating β, it has to be (a) warned by at least one factor β ∈ V u β (j) to take on a satisfying value for β , and (b) all the other factors in V s β (j) are not sending warnings.In Equation 18, the probability of this event, Π u j→β , is a product of two terms, the first corresponding to event (a) and the second to event (b).The definitions of Π s j→β and Π 0 j→β are defined in a similar manner.In Equation 21, the final survey η β→i is simply the probability of the joint event that all incoming variables X j are violating the clause β, forcing the last variable X i to satisfy β.
The SP algorithm consists of iteratively running the above update equations until the surveys converge.When the surveys converged, we can then calculate local biases as follows: To solve an instances of the SAT problem, the SP algorithm is run until it converges, and a few variables that are highly constrained are set to their preferred values.The SAT instance is then reduced to a smaller instance, and SP can be run again on the smaller instance.This continues until SP fails to set any more variables, and in this case, a local search algorithm such as WalkSAT is run on the remaining instance.This algorithm, called the survey inspired decimation algorithm (Braunstein et al., 2005), is given in the algorithm in Figure 6.

The SP-y Algorithm
In contrast to the SP algorithm, the SP-y algorithm's objective is to solve Max-SAT instances, and hence clauses are allowed to be violated, at a price.The SP algorithm can be understood as a special case of the SP-y algorithm, with y taken to infinity (Battaglia et al., 2004).In SP-y, a penalty value of exp(−2y) is multiplied into the distribution for each violated clause.Hence, although the message passing algorithm allows the violation of clauses, but as the value of y increases, the surveys will prefer configurations that violate a minimal number of clauses.
INPUT: A SAT problem, and a constant k.
OUTPUT: A satisfying configuration, or report FAILURE.
2. Iteratively update the surveys using Equations 18 to 21.
3. If SP does not converge, go to step 7.

If SP converges, calculate W +
i and W − i using Equations 25 and 26. 5. Decimation: sort all variables based on the absolute difference |W + i − W − i |, and set the top k variables to their preferred value.Simplify the instance with these variables removed.
6.If all surveys equal zero, (no variables can be removed in step 5), output the simplified SAT instance.Otherwise, go back to the first step with the smaller instance.
7. Run WalkSAT on the remaining simplified instance, and output a satisfying configuration if WalkSAT succeeds.Otherwise output FAILURE.
Figure 6: The survey inspired decimation (SID) algorithm for solving a SAT problem (Braunstein et al., 2005) The SP-y algorithm can still be understood as a message passing algorithm over factor graphs.As in SP, each factor, β, passes a survey, η β→i , to a neighboring variable X i , corresponding to the probability of the warning.To simplify notations, we define η + β→i (resp.η − β→i ) to be the probability of the warning against taking the value +1 (resp.−1), and we define η 0 β→i = 1 − η + β→i − η − β→i .In practice, since a clause can only warn against either +1 or −1 but not both, either η + β→i or η − β→i equals zero: η J β,i β→i = η β→i , and η −J β,i β→i = 0, where J β,i is defined in Definition 1.
Since clauses can be violated, it is insufficient to simply keep track of whether a variable has been warned against a value or not.It is now necessary to keep track of how many times the variable has been warned against each value, so that we know how many clauses will be violated if the variable was to take a particular value.Let H + j→β (resp.H − j→β ) be the number of times the variable X j is warned by factors in {β } β ∈V (j)\β against the value +1 (resp.−1).In SP-y, the variable X j will be forced by β to take the value +1 if H + j→β is smaller than H − j→β , and the penalty in this case will be exp(−2yH + j→β ).In notations used in the work of Battaglia et al. (2004) describing SP-y, let Battaglia et al. (2004) defined the SP-y message passing equations that calculate the probability distribution over h j→β , based on the input surveys, where |V (j)| refers to the cardinality of the set V (j).The unnormalized distributions P j→β (h) are calculated as follows: where δ(h) = 1 if h = 0, and zero otherwise, and θ(h) = 1 if h ≥ 0, and zero otherwise.
The above equations take into account each neighbor of j excluding β, from γ = 1 to γ = |V (j)|−1.The penalties exp(−2y) are multiplied every time the value of h j→β decreases in absolute value, as each new neighbor of X j , β γ , is added.At the end of the procedure, this is equivalent to multiplying the messages with a factor of exp(−2y×min(H + j→β , H − j→β )).The P j→β (h) are then normalized into P j→β (h) by computing P j→β (h) for all possible values of h in [−|V (j)| + 1, |V (j)| − 1].The message updates for the surveys are as follows: The quantity W + j→β (resp.W − j→β ) is the probability of all events warning against the value +1 (resp.−1).Equation 34 reflects the fact that a warning is sent from β to the variable X i if and only if all other variables in β are warning β that they are going to violate β.
When SP-y converges, the preference of each variable is calculated as follows: where the P j (h) are calculated in a similar manner as the P j→β (h), except that it does not exclude β in its calculations.
With the above definitions for message updates, the SP-y algorithm can be used to solve Max-SAT instances by a survey inspired decimation algorithm similar to the one for SP given in the algorithm in Figure 6.At each iteration of the decimation process, the SP-y decimation procedure selects variables to fix to their preferred values based on the quantity In the work of Battaglia et al. (2004), an additional backtracking process was introduced to make the decimation process more robust.This backtracking process allows the decimation procedure to unfix variables already fixed to their values.For a variable X j fixed to the value x j , the following quantities are calculated: Variables are ranked according to this quantity and the top variables are chosen to be unfixed.In the algorithm in Figure 7, we show the backtracking decimation algorithm for SP-y (Battaglia et al., 2004), where the value of y is either given as input, or can be determined empirically.
INPUT: A Max-SAT instance and a constant k.Optional input: y in and a backtracking probability r.
OUTPUT: A configuration.
2. If y in is given, set y = y in .Otherwise, determine the value of y with the bisection method.
3. Run SP-y until convergence.If SP-y converges, for each variable X i , extract a random number q ∈ [0, 1].
(a) If q > r, sort the variables according to Equation 38 and fix the top k most biased variables.(b) If q < r sort the variables according to Equation 39 and unfix the top k most biased variables.
4. Simplify the instance based on step (3).If SP-y converged and return a nonparamagnetic solution (a paramagnetic solution refers to a set of {b f ix (j)} j∈V that are not biased to any value for all variables), go to step (1).
5. Run weighted WalkSAT on the remaining instance and outputs the best configuration found.
Figure 7: The survey inspired decimation (SID) algorithm for solving a Max-SAT instance (Battaglia et al., 2004)

Relaxed Survey Propagation
It was shown (Maneva et al., 2004;Braunstein & Zecchina, 2004) that SP for the SAT problem can be reformulated as a BP algorithm on an extended factor graph.However, their formulation cannot be generalized to explain the SP-y algorithm which is applicable to Max-SAT problems.In a previous paper (Chieu & Lee, 2008), we extended the formulation in the work of Maneva et al. (2004) to address the Max-SAT problem.In this section, we will modify the formulation in our previous paper (Chieu & Lee, 2008) to address the weighted Max-SAT problem, by setting up an extended factor graph on which we run the BP algorithm.In Theorem 3, we show that this formulation generalizes the BP interpretation of SP given in the work of Maneva et al. (2004), and in the main theorem (Theorem 2), we show that running the loopy BP algorithm on this factor graph estimates marginals over covers of configurations violating a set of clauses with minimal total weight.We will first define the concept of covers in Section 5.1, before defining the extended factor graph in Section 5.2.In the rest of this section, given a weighted Max-SAT problem (X, C, W), we will assume that variables in X take values in {−1, +1, * }: the third value is a "don't care" state, corresponding to a no-warning message for the SP algorithm defined in the Section 4.

Covers in Weighted Max-SAT
First, we need to define the semantics of the value * as a "don't care" state.Definition 4. (Maneva et al., 2004) Given a configuration x, we say that a variable X i is the unique satisfying variable for a clause β ∈ C if it is assigned s β,i whereas all other variables X j in the clause are assigned u β,j (see Definition 2 for the definitions of s β,i and u β,i ).A variable X i is said to be constrained by the clause β if it is the unique satisfying variable for β.A variable is unconstrained if it is not constrained by any clauses.Define where Ind(P ) equals 1 if the predicate P is true, and 0 otherwise.
In the following definition, we redefine when a configuration taking values in {−1, +1, * } satisfies or violates a clauses.Definition 5. A configuration satisfies a clause β if and only if (i) β contains a variable X i set to the value s β,i , or (ii) when at least two variables in β take the value * .A configuration violates a clause β if all the variables X j in β are set to u β,j .A configuration x is invalid for clause β if and only if exactly one of the variables in β is set to * , and all the other remaining variables in β are set to u β,i .A configuration is valid if it is valid for all clauses in C.
The above definition for invalid configurations reflects the interpretation that the * value is a "don't care" state: clauses containing a variable X i = * should already be satisfied by other variables, and the value of X i does not matter.So X i = * cannot be the last remaining possibility of satisfying any clause.In the case where a clause contains two variables set to * , the clause can be satisfied by either one of these two variables, so the other variable can take the "don't care" value.
We define a partial order on the set of all valid configurations as follows (Maneva et al., 2004): Definition 6.Let x and y be two valid configurations.We write x ≤ y if ∀i, (1) This partial order defines a lattice, and Maneva et al. (2004) showed that SP is a "peeling" procedure that peels a satisfying configuration to its minimal element in the lattice.A cover is a minimal element in the lattice.In the SAT region, a cover can be defined as follows (Kroc, Sabharwal, & Selman, 2007): Definition 7. A cover is a valid configuration x ∈ {−1, +1, * } N that satisfies all clauses, and has no unconstrained variables assigned -1 or +1.
The SP algorithm was shown to return marginals over covers (Maneva et al., 2004).In principle, there are two kinds of covers: true covers which correspond to satisfying configurations, and false covers which do not.Kroc et al. (2007) showed empirically that the number of false covers is negligible for SAT instances.For RSP to apply to weighted Max-SAT instances, we introduce the notion of v-cover: Definition 8.A v-cover is a valid configuration x ∈ {−1, +1, * } N such that 1. the total weight of clauses violated by the configuration equals v, 2. x has no unconstrained variables assigned -1 or +1.
Hence the covers defined in Definition 7 are simply v-covers with v = 0 (i.e.0-covers).

The Extended Factor Graph
In this section, we will define a joint distribution over an extended factor graph that is positive only over v-covers.First, we will need to define functions that will be used to define the factors in the extended factor graph.Definition 9.For each clause, β ∈ C, the following function assigns different values to configurations that satisfy, violate or are invalid (see Definition 5) for β: In the above definition, we introduced a parameter y in the RSP algorithm, which plays a similar role to the y in the SP-y algorithm.The term exp(−w β y) is the penalty for violating a clause with weight w β .Definition 10. (Maneva et al., 2004) Given a configuration x, we define the parent set P i (x) of a variable X i to be the set of clauses for which X i = x i is the unique satisfying variable in a configuration x, (i.e. the set of clauses constraining X i to its value).Formally, In Example 2, for the configuration x = (+1, −1, −1), the parent sets are P 1 (x) = {β 5 , β 6 }, P 2 (x) = {β 2 }, and P 3 (x) = ∅.
Given the weighted Max-SAT instance (X, C, W) and its factor graph, G = ({V, F }, E), we now construct another distribution with an associated factor graph G s = ({V, F s }, E s ) as follows.For each i ∈ V , let P (i) be the set of all possible parent sets of the variable X i .Due to the restrictions imposed by our definition, P i (x) must be contained in either V + (i) or V − (i), but not both.Therefore, the cardinality of where X i := {−1, +1, * } × P (i).Hence this factor graph has the same number of variables as the original SAT instance, but each variable has a large cardinality.Given configurations x for the SAT instance, we denote configurations of Λ as λ(x) = {λ i (x)} i∈V , where λ i (x) = (x i , P i (x)).
The definitions given so far define the semantics of valid configurations and parent sets, and in the rest of this section, we will define factors in the extended factor graph G s to ensure that the above definitions are satisfied by configurations of Λ.
The single variable compatibilities (Ψ i ) are defined by the following factor on each variable λ i (x): The first case in the above definition for P i (x) = ∅ and x i = * corresponds to the case where the variable X i is unconstrained, and yet takes a value in {−1, +1}.Valid configurations that are not v-covers (with unconstrained variables set to −1 or +1) have a zero value in the above factor.Hence only v-covers have a positive value for these factors.In the last case in the above definition, the validity of (x i , P i (x)) simply means that if x i = +1 (resp. The clause compatibilities (Ψ β ) are: where Ind is defined in Definition 4. These clause compatibilities introduce the penalties in VAL β (x V (β) ) into the joint distribution.The second term in the above equation enforces that the parent sets P k (x) are consistent with the definitions of parent sets in Definition 10 for each variable X k in the clause β.
The values of x determines uniquely the values of P = {P i (x)} i∈V , and hence the distribution over λ(x) = {x i , P i (x)} i∈V is simply a distribution over x.
Theorem 1.Using the notation UNSAT(x) to represent the set of all clauses violated by x, the underlying distribution p(Λ) of the factor graph defined in this section is positive only over v-covers, and for a v-cover x, we have: Proof.Configurations that are not v-covers are either invalid or contains unconstrained variables set to −1 or +1.For invalid configurations, the distribution is zero because of the definition of VAL β , and for configurations with unconstrained variables set to −1 or +1, the distribution is zero due to the definition of the factors ψ i .For each v-cover, the total penalty from violated clauses is the product term in Equation 45.
The above definition defines a joint distribution over a factor graph.The RSP algorithm is a message passing algorithm defined on this factor graph: Definition 11.The RSP algorithm is defined as the loopy BP algorithm applied to the extended factor graph G s associated with a MaxSAT instance (X, C, W).
In Section 6, we will formulate the message passing updates for RSP, as well as a decimation algorithm for using RSP as a solver for weighted Max-SAT instances.As an example, Figure 8 shows the extended factor graph for the weighted Max-SAT instance defined in Example 1.
Definition 12.We define a min-cover for a weighted Max-SAT instance as the m-cover, where m is the minimum total weight of violated clauses for the instance.
Theorem 2. When y is taken to ∞, RSP estimates marginals over min-covers in the following sense: the stationary points of the RSP algorithm correspond to the stationary points of the Bethe free energy on a distribution that is uniform over min-covers.Proof.The ratio of the probability of a v-cover to that of a (v + )-cover equals exp( y).
When y is taken to ∞, the distribution in Equation 45is positive only over min-covers.Hence RSP, as the loopy BP algorithm over the factor graph representing Equation 45, estimates marginals over min-covers.
In the application of RSP to weighted Max-SAT instances, taking y to ∞ would often cause the RSP algorithm to fail to converge.Taking y to a sufficiently large value is often sufficient for RSP to be used to solve weighted Max-SAT instances.
In Figure 9, we show the v-covers of a small weighted Max-SAT example in Example 2. In this example, there is a unique min-cover with X 1 = +1, X 2 = −1, and X 3 = * .Maneva et al. (2004) formulated the SP-ρ algorithm, which is equivalent to the SP algorithm (Braunstein et al., 2005) for ρ = 1.The SP-ρ algorithm is the loopy BP algorithm on the extended factor graph defined in the work of Maneva et al. (2004).Comparing the definitions of the extended factor graph and factors for RSP and SP-ρ, we have (Chieu & Lee, 2008): Proof.The definitions of the joint distribution for SP-ρ for ρ = 1 (Maneva et al., 2004), and for RSP in this paper differ only in Definition 9, and with y → ∞ in RSP, their definitions become identical.Since SP-ρ and RSP are both equivalent to the loopy BP on the distribution defined on their extended factor graphs, the equivalence of their joint distribution means that the algorithms are equivalent.
Taking y to infinity corresponds to disallowing violated clauses, and SP-ρ was formulated for satisfiable SAT instances, where all clauses must be satisfied.For SP-ρ, clause weights are inconsequential as all clauses have to be satisfied.
In this paper, we disallow unconstrained variables to take the value * .In the Appendix A, we give an alternative definition for the single variable potentials in Equation 43.With this definition, Maneva et al. (2004) defines a smoothing interpretation for SP-ρ.This smoothing can also be applied to RSP.See Theorem 6 in the work of Maneva et al. (2004) and the Appendix A for more details.

The Importance of Convergence
It was found that message passing algorithms such as the BP and the SP algorithms perform well whenever they converge (e.g., see Kroc, Sabharwal, & Selman, 2009).While the success of the RSP algorithm on random ensembles of Max-SAT and weighted Max-SAT instances are believed to be due to the clustering phenomenon on such problems, we found that RSP could also be successful in cases where the clustering phenomenon is not observed.We believe that the presence of large clusters help the SP algorithm to converge well, but as long as the SP algorithm converges, the presence of clusters is not necessary for good performance.
When covers are simply Boolean configurations (with no variables taking the * value), they represent singleton clusters.We call such covers degenerate covers.In many structured and non random weighted Max-SAT problems, we have found that the covers we found are often degenerate.In a previous paper (Chieu, Lee, & Teh, 2008), we have defined a modified version of RSP for energy minimization over factor graphs, and we show in Lemma 2 in that paper that configurations with * have zero probability, i.e. all covers are degenerate.In that paper, we showed that the value of y can be tuned to favor the convergence of the RSP algorithm.
In Section 7.3, we show the success of RSP on a few benchmark Max-SAT instances.In trying to recover the covers of the configurations found by RSP, we found that all the benchmark instances used have degenerate covers.The fact that RSP converged on these instances is sufficient for RSP to outperform local search algorithms.

Using RSP for Solving the Weighted Max-SAT Problem
In the previous section, we defined the RSP algorithm in Definition 11 to be the loopy BP algorithm over the extended factor graph.In this section, we will derive the RSP message passing algorithm based on this definition, before giving the decimation-based algorithm used for solving weighted Max-SAT instances.

The Message Passing Algorithm
The variables in the extended factor graphs are no longer Boolean.They are of the form λ i (x) = (x i , P i (x)), which are of large cardinalities.In the definition of the BP algorithm, we have stated that the message vector passed between factors and variables are of length equal to the cardinality of the variables.In this section, we show that the messages passed in RSP can be grouped into a few groups, so that each message passed between variables and factors has only three values.
In RSP, the factor to variable messages are grouped as follows: where S ⊆ V s β (i), (all cases where the variable x i is constrained by the clause β), 1.In RSP, the penalties are imposed as each factor passes a message to a variable.For SP-y, the penalties are imposed when a variable compiles all the incoming warnings, and decides how many factors it is going to violate.
2. Importantly, in RSP, variables participating in violated clauses can never take the * value.For SP-y, a variable receiving an equal number of warnings from the set of factors {β } β ∈V (i)\β against taking the +1 and the −1 value (i.e.h j→β = H + j→β − H − j→β = 0) will decide to pass a message with no warning to β. Hence for SP-y, it is possible for variables in violated clauses to take a "don't care" state.
3. In the work of Battaglia et al. (2004) where SP-y was formulated with the cavity approach, it was found that the optimal value of y for a given Max-SAT problem is y * = δΣ δe , where Σ is the complexity in statistical physics, and e is the energy density (Mezard & Zecchina, 2002).They stated that y * is a finite value when the energy of the Max-SAT problem is not zero.In Theorem 2, we show that for RSP, y should be as large as possible so that the underlying distribution is over min-covers.In our experimental results in Figure 12, we showed that this is indeed true for RSP, as long as it converges.4. If RSP converges and at least one variable is set, go back to step (1) with the simplified instance.Otherwise, run the (weighted) WalkSAT solver on the simplified instance and output the configuration found.
Figure 11: The decimation algorithm for RSP for solving a (weighted) Max-SAT instance

The Decimation Algorithm
The decimation algorithm is shown in Figure 11.This is the algorithm we used for our experiments described in Section 7. In comparing RSP with SP-y on random Max-SAT instances in Section 7.1, we run both algorithms with a fixed y in , and vary the y in over a range of values.Comparing Figure 11 to Figure 7 for SP-y, the condition used in SPy to check for a paramagnetic solution is replaced by the condition given in Step (2) in Figure 11.In the experimental results in Section 7.1, we used the SP-y implementation available online (Battaglia et al., 2004), which contains a mechanism for backtracking on decimation decisions (see Figure 7).In Section 7.1, RSP still outperforms SP-y despite not backtracking on its decisions.When running RSP on weighted Max-SAT, we found that it was necessary to adjust y dynamically during the decimation process.For details on experimental settings, please refer to Section 7.

Experimental Results
We run experiments on random Max-3-SAT, random weighted Max-SAT, as well as on a few benchmark Max-SAT instances used in the work of Lardeux, Saubion, and Hao (2005).
#Viols #Viols  1.The objective of showing the graphs in this figure is to show that the behavior of RSP over varying y is consistent with Theorem 2: as long as RSP converges, its performance improves as y increases.In the graph, RSP reaches a plateau when it fails to converge.This property allows for a systematic search for a good value of y to be used.The behavior of SP-y over varying y is less consistent. .

Random Max-3-SAT
We run experiments on randomly generated Max-3-SAT instances of 10 4 variables, with different clause-to-variable ratios.The random instances are generated by the SP-y code available online (Battaglia et al., 2004).In Figure 12, we compare SP-y and RSP on random Max-3-SAT with different clause-to-variable ratio, α.We vary α from 4.2 to 5.2 to show the performance of SP-y and RSP in the UNSAT region of 3-SAT, beyond its phase transition at α c ≈ 4.267.For each value of α, the number of violated clauses (y-axis) is plotted against the value of y used.
Table 1: Number of violated clauses attained by each method.For SP-y, "SP-y (BT)" (SPy with backtracking), and RSP, the best result is selected over all y.For each α, we show the best performance in bold face.The column "Fix" shows the number of variables fixed by RSP at the optimal y, and "Time" the time taken by RSP (in minutes) to fix those variables, on an AMD Opteron 2.2GHz machine.each y, RSP fixes a number of spins, and we stop increasing y when the number of spins fixed decreases over the previous value of y.For UBCSAT, we run 1000 iterations for each of the 20 solvers.Results are shown in Table 2. Out of the seven instances, RSP fails to fix any spins on the first one, but outperforms UBCSAT on the rest.Lardeux et al. (2005) did not show best performances in their paper, but their average results were an order of magnitude higher than results in Table 2. Figure 12 shows that finding a good y for SP-y is hard.On the benchmark instances, we run SP-y with the "-Y" option (Battaglia et al., 2004) that uses dichotomic search for y: SP-y failed to fix any spins on all 7 instances.
Table 2: Benchmark Max-SAT instances.Columns: "instance" shows the instance name in the paper of Lardeux et al. (2005), "nvar" the number of variables, "ubcsat" and "rsp-x" (x is the number of decimations at each iteration) the number of violated clauses returned by each algorithm, and "fx-x" the number of spins fixed by RSP.
Best results are indicated in bold face.The success of the SP family of algorithms on random ensembles of SAT or Max-SAT problem are usually due to the clustering phenomenon on such random ensembles.As the benchmark instances are not random instances, we attempted to see if the configurations found by RSP do indeed belong to a cover representing a cluster of solutions.Rather disappointingly, we found that for all 6 solutions where RSP outperformed local search algorithms, the variables in the solutions are all constrained by at least one clause.Hence, the v-covers found are degenerate covers, i.e. the covers do not contain variables set to * .It appears that the success of RSP on these benchmark instances is not due to the clustering phenomenon, but simply because RSP manages to converge for these instances, for some value of y.Kroc, Sabharwal, and Selman (2009) made a similar observation: the convergence of BP or SP like algorithms is often sufficient for obtaining a good solution to a given problem.As discussed in Section 5.3, the ability to vary y to improve convergence is a useful feature of RSP, but one that is distinct from its ability to exploit the clustering phenomenon.

Conclusion
While recent work on Max-SAT or weighted Max-SAT tends to focus more on complete solvers, these solvers are unable to handle large instances.In the Max-SAT competition 2007 (Argelich, Li, Manya, & Planes, 2007), the largest Max-3-SAT instances used have only 70 variables.For large instances, complete solvers are still not practical, and local search procedures have been the only feasible alternative.SP-y, generalizing SP, has been shown to be able to solve large Max-3-SAT instances at its phase transition, but lacks the theoretical explanations that recent work on SP has generated.For 3-SAT, there is an easy-hard-easy transition as the clause-to-variable ratio increases.For Max-3-SAT, however, it has been shown empirically that beyond the phase transition of satisfiability, all instances are hard to solve (Zhang, 2001).In this paper, we show that RSP outperforms SP-y as well as other local search algorithms on Max-SAT and weighted Max-SAT instances, well beyond the phase transition region.
Both RSP and SP-y do well on Max-SAT instances near the phase transition.The mechanisms behind SP-y and RSP are similar: both algorithms impose a penalty term for each violated constraint, and both reduce to SP when → ∞.SP-y uses a population dynamics algorithm, which can also be seen as a warning propagation algorithm.In this paper, we have formulated the RSP algorithm as a BP algorithm over an extended factor graph, enabling us to understand RSP as estimating marginals over min-covers.
Case ω * = 0: in this case, only configurations x with n * (x) = 0 have non-zero probability in the distribution given in Equation 56.Hence, the value * is forbidden, and all variables take values in −1, +1.A Boolean configuration violating clauses with weight W has a probability proportional to exp(−yW ).Hence we retreive the weighted Max-SAT energy defined in Equation 13.In this case, the factor graph is equivalent to the original weighted Max-SAT factor graph defined in Definition 3, and hence RSP-ρ is equivalent to the loopy BP algorithm on the original weighted Max-SAT problem.
Case ω * = 1 and ω * = 0: in this case, all valid configurations x violating clauses with a total weight W has a probability proportional to ω n 0 (x) 0 ω n * (x) * exp(−yW ).Hence, the probability of v-covers in the case where ω * = 1 are spread over the lattice for which it is the minimal element.
With the above formulation, RSP-ρ can be seen as a family of algorithms that include the BP and the RSP algorithm, moving from BP to RSP as ρ (or ω * ) varies from 0 to 1.

Figure 2 :
Figure 2: The factor graph for the SAT instance given in Example 1. Dotted (resp.solid) lines joining a variable to a clause means the variable is a negative (resp.positive) literal in the clause.

Figure 3 :
Figure 3: Illustration of messages in a BP algorithm.

Figure 8 :
Figure8: The extended factor graph for the SAT instance given in Example 1.The factor nodes β i correspond to the clause compatibility factors Ψ β i , and the single variable factor nodes γ i represents the single variable compatibility factors Ψ i .This factor graph is similar to the original factor graph of the SAT instance in Figure2, except that it has additional factor nodes γ i .
INPUT: A (weighted) Max-SAT instance, a constant k, and y in OUTPUT: A configuration.ALGORITHM : 1. Randomly initialize the surveys and set y = y in .2. Run RSP with y.If RSP converges, sort the variables according to the quantities b i = |P (x i = +1) − P (x i = −1)|, and fix the top k variables to their preferred values, subject to the condition that b i > 0.5.3. (For weighted Max-SAT) If RSP fails to converge, adjust the value of y.

Figure 12 :
Figure 12: Behaviour of SP-y and RSP over varying values of y on the x-axis, and the number of violated clauses (#viols) on the y-axis.The comparison of the performances between RSP and SP-y are shown in Table1.The objective of showing the graphs in this figure is to show that the behavior of RSP over varying y is consistent with Theorem 2: as long as RSP converges, its performance improves as y increases.In the graph, RSP reaches a plateau when it fails to converge.This property allows for a systematic search for a good value of y to be used.The behavior of SP-y over varying y is less consistent.

Figure 13 :
Figure 13: Experimental results for weighted Max-SAT instances.The x-axis shows the value of α, and the y-axis (W-viol) is the number of violated clauses returned by each algorithm.