Automated Reasoning in Modal and Description Logics via SAT Encoding: the Case Study of K(m)/ALC-Satisfiability

In the last two decades, modal and description logics have been applied to numerous areas of computer science, including knowledge representation, formal verification, database theory, distributed computing and, more recently, semantic web and ontologies. For this reason, the problem of automated reasoning in modal and description logics has been thoroughly investigated. In particular, many approaches have been proposed for efficiently handling the satisfiability of the core normal modal logic K(m), and of its notational variant, the description logic ALC. Although simple in structure, K(m)/ALC is computationally very hard to reason on, its satisfiability being PSPACE-complete. In this paper we start exploring the idea of performing automated reasoning tasks in modal and description logics by encoding them into SAT, so that to be handled by state-of-the-art SAT tools; as with most previous approaches, we begin our investigation from the satisfiability in K(m). We propose an efficient encoding, and we test it on an extensive set of benchmarks, comparing the approach with the main state-of-the-art tools available. Although the encoding is necessarily worst-case exponential, from our experiments we notice that, in practice, this approach can handle most or all the problems which are at the reach of the other approaches, with performances which are comparable with, or even better than, those of the current state-of-the-art tools.


Motivations and Goals
In the last two decades, modal and description logics have provided an essential framework for many applications in numerous areas of computer science, including artificial intelligence, formal verification, database theory, distributed computing and, more recently, semantic web and ontologies.For this reason, the problem of automated reasoning in modal and description logics has been thoroughly investigated (e.g., Fitting, 1983;Ladner, 1977;Baader & Hollunder, 1991;Halpern & Moses, 1992;Baader, Franconi, Hollunder, Nebel, & Profitlich, 1994;Massacci, 2000).In particular, the research in modal and description logics has followed two parallel routes until the seminal work of Schild (1991), which proved that the core modal logic K m and the core description logic ALC are one a notational variant of the other.Since then, analogous results have been produced for a bunch of other logics, so that, nowadays the two research lines have mostly merged into one research flow.
Many approaches have been proposed for efficiently reasoning in modal and description logics, starting from the problem of checking the satisfiability in the core normal modal logic K m and in its notational variant, the description logic ALC (hereafter simply "K m ").We classify them as follows.
c 2009 AI Access Foundation.All rights reserved.
• The DPLL-based approach (Giunchiglia & Sebastiani, 1996, 2000) differs from the previous one mostly in the fact that a Davis-Putnam-Logemann-Loveland (DPLL) procedure, which treats the modal subformulas as propositions, is used instead of the classic propositional tableaux procedure at each nesting level of the modal operators.KSAT (Giunchiglia & Sebastiani, 1996), ESAT (Giunchiglia, Giunchiglia, & Tacchella, 2002) and *SAT (Tacchella, 1999), are the representative tools of this approach.
These two approaches merged into the "modern" tableaux-based approach, which has been extended to work with more expressive description logics and to provide more sophisticate reasoning functions.Among the tools employing this approach, we recall FaCT/FaCT++ and DLP (Horrocks & Patel-Schneider, 1999), and Racer (Haarslev & Moeller, 2001).1 • In the translational approach (Hustadt & Schmidt, 1999;Areces, Gennari, Heguiabehere, & de Rijke, 2000) the modal formula is encoded into first-order logic (FOL), and the encoded formula can be decided efficiently by a FOL theorem prover (Areces et al., 2000).Mspass (Hustadt, Schmidt, & Weidenbach, 1999) is the most representative tool of this approach.
• The CSP-based approach (Brand, Gennari, & de Rijke, 2003) differs from the tableauxbased and DPLL-based ones mostly in the fact that a CSP (Constraint Satisfaction Problem) engine is used instead of tableaux/DPLL.KCSP is the only representative tool of this approach.
• In the Inverse-method approach (Voronkov, 1999(Voronkov, , 2001)), a search procedure is based on the "inverted" version of a sequent calculus (which can be seen as a modalized version of propositional resolution).K K (Voronkov, 1999) is the only representative tool of this approach.
• In the Automata-theoretic approach, (a symbolic representation based on BDDs -Binary Decision Diagrams -of) a tree automaton accepting all the tree models of the input formula is implicitly built and checked for emptiness (Pan, Sattler, & Vardi, 2002;Pan & Vardi, 2003).KBDD (Pan & Vardi, 2003) is the only representative tool of this approach.
• Pan and Vardi (2003) presented also an encoding of K-satisfiability into QBF-satisfiability (which is PSpace-complete too), combined with the use of a state-of-the-art QBF (Quantified Boolean Formula) solver.We call this approach QBF-encoding approach.
To the best of our knowledge, the last four approaches so far are restricted to the satisfiability in K m only, whilst the translational approach has been applied to numerous modal and description logics (e.g.traditional modal logics like T m and S4 m , and dynamic modal logics) and to the relational calculus.A significant amount of benchmarks formulas have been produced for testing the effectiveness of the different techniques (Halpern & Moses, 1992;Giunchiglia, Roveri, & Sebastiani, 1996;Heuerding & Schwendimann, 1996;Horrocks, Patel-Schneider, & Sebastiani, 2000;Massacci, 1999;Patel-Schneider & Sebastiani, 2001, 2003).
In the last two decades we have also witnessed an impressive advance in the efficiency of propositional satisfiability techniques (SAT), which has brought large and previouslyintractable problems at the reach of state-of-the-art SAT solvers.Most of the success of SAT technologies is motivated by the impressive efficiency reached by current implementations of the DPLL procedure, (Davis & Putnam, 1960;Davis, Longemann, & Loveland, 1962), in its most-modern variants (Silva & Sakallah, 1996;Moskewicz, Madigan, Zhao, Zhang, & Malik, 2001;Eén & Sörensson, 2004).Current implementations can handle formulas in the order of 10 7 variables and clauses.
In this paper we start exploring the idea of performing automated reasoning tasks in modal and description logics by encoding them into SAT, so that to be handled by state-ofthe-art SAT tools; as with most previous approaches, we begin our investigation from the satisfiability in K m .
In theory, the task may look hopeless because of worst-case complexity issues: in fact, with few exceptions, the satisfiability problem in most modal and description logics is not in NP, typically being PSpace-complete or even harder -PSpace-complete for K m (Ladner, 1977;Halpern & Moses, 1992)-so that the encoding is in worst-case non polynomial. 2n practice, however, a few considerations allow for not discarding that this approach may be competitive with the state-of-the-art approaches.First, the non-polynomial bounds above are worst-case bounds, and formulas may have different behaviors from that of the pathological formulas which can be found in textbooks.(E.g., notice that the exponentiality is based on the hypothesis of unboundedness of some parameter like the modal depth; Halpern & Moses, 1992;Halpern, 1995.)Second, some tricks in the encoding may allow for reducing the size of the encoded formula significantly.Third, as the amount of RAM memory in current computers is in the order of the GBytes and current SAT solvers can successfully handle huge formulas, the encoding of many modal formulas (at least of those which are not too hard to solve also for the competitors) may be at the reach of a SAT solver.Finally, even for PSpace-complete logics like K m , also other state-of-the-art approaches are not guaranteed to use polynomial memory.
In this paper we show that, at least for the satisfiability K m , by exploiting some smart optimizations in the encoding the SAT-encoding approach becomes competitive in practice with previous approaches.To this extent, the contributions of this paper are manyfold.
• We propose a basic encoding of K m formulas into purely-propositional ones, and prove that the encoding is satisfiability-preserving.
• We describe some optimizations of the encoding, both in form of preprocessing and of on-the-fly simplification.These techniques allow for significant (and in some cases dramatic) reductions in the size of the resulting Boolean formulas, and in performances of the SAT solver thereafter.
• We perform a very extensive empirical comparison against the main state-of-the-art tools available.We show that, despite the NP-vs.-PSpaceissue, this approach can handle most or all the problems which are at the reach of the other approaches, with performances which are comparable with, and sometimes even better than, those of the current state-of-the-art tools.In our perspective, this is the most surprising contribution of the paper.
• As a byproduct of our work, we obtain an empirical evaluation of current tools for K msatisfiability available, which is very extensive in terms of both amount and variety of benchmarks and of number and representativeness of the tools evaluated.We are not aware of any other such evaluation in the recent literature.
We also stress the fact that with our approach the encoder can be interfaced with every SAT solver in a plug-and-play manner, so that to benefit for free of every improvement in the technology of SAT solvers which has been or will be made available.
Content.The paper is structured as follows.In Section 2 we provide the necessary background notions on modal logics and SAT.In Section 3 we describe the basic encoding from K m to SAT.In Section 4 we describe and discuss the main optimizations, and provide many examples.In Section 5 we present the empirical evaluation, and discuss the results.In Section 6 we present some related work and current research trends.In Section 7 we conclude, and describe some possible future evolutions.
A six-page preliminary version of this paper, containing some of the basic ideas presented here, was presented at SAT'06 conference (Sebastiani & Vescovi, 2006).For the readers' convenience, an online appendix is provided, containing all plots of Section 5 in full size.Moreover, in order to make the results reproducible, the encoder, the benchmarks and the random generators with the seeds used are also available in the online appendix.

Background
In this section we provide the necessary background on the modal logic K m (Section 2.1) and on SAT and the DPLL procedure (Section 2.2).
In order to make our presentation more uniform, and to avoid considering the polarity of subformulas, we adopt the traditional representation of K m -formulas (introduced, as far as we know, by Fitting, 1983 andwidely used in literature, e.g. Fitting, 1983;Massacci, 2000;Donini & Massacci, 2000) from the following table: in which non-literal K m -formulas are grouped into four categories: α's (conjunctive), β's (disjunctive), π's (existential), ν's (universal).Importantly, all such formulas occur in the main formula with positive polarity only.This allows for disregarding the issue of polarity of subformulas.The semantic of modal logics is given by means of Kripke structures.A Kripke structure for K m is a tuple M = U, L, R 1 , . . ., R m , where U is a set of states, L is a function L : A × U −→ {T rue, F alse}, and each R r is a binary relation on the states of U.With an abuse of notation we write "u ∈ M" instead of "u ∈ U".We call a situation any pair M, u, M being a Kripke structure and u ∈ M. The binary relation |= between a modal formula ϕ and a situation M, u is defined as follows: The problem of determining the K m -satisfiability of a K m -formula ϕ is decidable and PSPACE-complete (Ladner, 1977;Halpern & Moses, 1992), even restricting the language to a single Boolean atom (i.e., A = {A 1 }; Halpern, 1995); if we impose a bound on the modal depth of the K m -formulas, the problem reduces to NP-complete (Halpern, 1995).For a more detailed description on K m -including, e.g., axiomatic characterization, decidability and complexity results -we refer the reader to the works of Halpern andMoses (1992), andHalpern (1995).
A K m -formula is said to be in Negative Normal Form (NNF) if it is written in terms of the symbols 2 r , 3 r , ∧, ∨ and propositional literals A i , ¬A i (i.e., if all negations occur only before propositional atoms in A).Every K m -formula ϕ can be converted into an equivalent one NNF (ϕ) by recursively applying the rewriting rules: A K m -formula is said to be in Box Normal Form (BNF) (Pan et al., 2002;Pan & Vardi, 2003) if it is written in terms of the symbols 2 r , ¬2 r , ∧, ∨, and propositional literals A i , ¬A i (i.e., if no diamonds are there, and all negations occur only before boxes or before propositional atoms in A).Every K m -formula ϕ can be converted into an equivalent one BNF (ϕ) by recursively applying the rewriting rules:

Propositional Satisfiability with the DPLL Algorithm
Most state-of-the-art SAT procedures are evolutions of the DPLL procedure (Davis & Putnam, 1960;Davis et al., 1962).A high-level schema of a modern DPLL engine, adapted from the one presented by Zhang and Malik (2002), is reported in Figure 1.The Boolean formula ϕ is in CNF (Conjunctive Normal Form); the assignment µ is initially empty, and it is updated in a stack-based manner.
In the main loop, decide next branch(ϕ, µ) chooses an unassigned literal l from ϕ according to some heuristic criterion, and adds it to µ. (This operation is called decision, l is called decision literal and the number of decision literals in µ after this operation is called the decision level of l.)In the inner loop, deduce(ϕ, µ) iteratively deduces literals l 1.
}}} deriving from the current assignment and updates ϕ and µ accordingly; this step is repeated until either µ satisfies ϕ, or µ falsifies ϕ, or no more literals can be deduced, returning sat, conflict and unknown respectively.(The iterative application of Boolean deduction steps in deduce is also called Boolean Constraint Propagation, BCP.)In the first case, DPLL returns sat.If the second case, analyze conflict(ϕ, µ) detects the subset η of µ which caused the conflict (conflict set) and the decision level blevel to backtrack.If blevel == 0, then a conflict exists even without branching, so that DPLL returns unsat.Otherwise, backtrack(blevel, ϕ, µ) adds the clause ¬η to ϕ (learning) and backtracks up to blevel (backjumping), updating ϕ and µ accordingly.In the third case, DPLL exits the inner loop, looking for the next decision.Notably, modern DPLL implementations implement techniques, like the two-watchedliteral scheme, which allow for extremely efficient handling of BCP (Moskewicz et al., 2001;Zhang & Malik, 2002).Old versions of DPLL used to implement also the Pure-Literal Rule (PLR) (Davis et al., 1962): when one proposition occurs only positively (resp.negatively) in the formula, it can be safely assigned to true (resp.false).Modern DPLL implementations, however, often do not implement it anymore due to its computational cost.For a much deeper description of modern DPLL-based SAT solvers, we refer the reader to the literature (e.g., Zhang & Malik, 2002).

The Basic Encoding
We borrow some notation from the Single Step Tableau (SST) framework (Massacci, 2000;Donini & Massacci, 2000).We represent uniquely states in M as labels σ, represented as non empty sequences of integers 1.n r 1 1 .nr 2 2 .... .nr k k , s.t. the label 1 represents the root state, and σ.n r represents the n-th R r -successor of σ (where r ∈ {1, . . ., m}).With a little abuse of notation, hereafter we may say "a state σ" meaning "a state labeled by σ".We call a labeled formula a pair σ, ψ , such that σ is a state label and ψ is a K m -formula, and we call labeled subformulas of a labeled formula σ, ψ all the labeled formulas σ, φ such that φ is a subformula of ψ.
Let A , be an injective function which maps a labeled formula σ, ψ , s.t.ψ is not in the form ¬φ, into a Boolean variable A σ, ψ .We conventionally assume that A σ, is and A σ, ⊥ is ⊥.Let L σ, ψ denote ¬A σ, φ if ψ is in the form ¬φ, A σ, ψ otherwise.Given a K m -formula ϕ, the encoder K m 2SAT builds a Boolean CNF formula as follows: Here by "π r,j " we mean that π r,j is the j-th distinct π r formula labeled by σ.Notice that ( 6) and ( 7) generalize to the case of n-ary ∧ and ∨ in the obvious way: if φ is Although conceptually trivial, this fact has an important practical consequence: in order to encode n i=1 φ i one needs adding only one Boolean variable rather than up to n−1, see Section 4.2.Notice also that in rule (9) the literals of the type L σ, π r,i are strictly necessary; in fact, the SAT problem must consider and encode all the possibly occuring states, but it can be the case, e.g., that a π r,i formula occurring in a disjunction is assigned to false for a particular state label σ (which, in SAT, corresponds to assign L σ, π r,i to false).In this situation all the labeled formulas regarding the state label σ.i are useless, in particular those generated by the expansion of the ν formulas interacting with π r,i . 4e assume that the K m -formulas are represented as DAGs (Direct Acyclic Graphs), so that to avoid the expansion of the same Def (σ, ψ) more than once.Then the various Def (σ, ψ) are expanded in a breadth-first manner wrt. the tree of labels, that is, all the possible expansions for the same (newly introduced) σ are completed before starting the expansions for a different state label σ , and different state label are expanded in the order they are introduced (thus all the expansions for a given state are always handled before those of any deeper state).Moreover, following what done by Massacci (2000), we assume that, for each σ, the Def (σ, ψ)'s are expanded in the order: α/β, π, ν.Thus, each Def (σ, ν r ) is expanded after the expansion of all Def (σ, π r,i )'s, so that Def (σ, ν r ) will generate one clause ((L σ, ν r ∧ L σ, π r,i ) → L σ.i, ν r 0 ) and one novel definition Def (σ.i, ν r 0 ) for each Def (σ, π r,i ) expanded. 5 Intuitively, it is easy to see that K m 2SAT (ϕ) mimics the construction of an SST tableau expansion (Massacci, 2000;Donini & Massacci, 2000).We have the following fact.
The complete proof of Theorem 1 can be found in Appendix A.
Notice that, due to (9), the number of variables and clauses in K m 2SAT (ϕ) may grow exponentially with depth(ϕ).This is in accordance to what was stated by Halpern and Moses (1992). 6It is easy to see that ϕ nnf is K 1 -unsatisfiable: the 3-atoms impose that at least one atom A i is true in at least one successor of the root state, whilst the 2-atoms impose that all atoms A i are false in all successor states of the root state.
After a run of Boolean constraint propagation (BCP), 3. reduces to the implicate disjunction.If the first element A 1, 3A 1 is assigned to true, then by BCP we have a conflict on 4. and 6.If it is set to false, then the second element A 1, 3(A 2 ∨A 3 ) is assigned to true, and by BCP we have a conflict on 12. Thus K m 2SAT (ϕ nnf ) is unsatisfiable.3

Optimizations
The basic encoding of Section 3 is rather naive, and can be much improved to many extents, in order to reduce the size of the output propositional formula, or to make it easier to solve by DPLL, or both.We distinguish two main kinds of optimizations: 5. In practice, even if the definition of Km2SAT is recursive, the Def expansions are performed grouped by states.More precisely, all the Def (σ.n, ψ) expansions, for any formula ψ and every defined n, are done together (in the α/β, π, ν order above exposed) and necessarily after that all the Def (σ, ϕ) expansions have been completed.6.For K 1 -formulas we omit the box and diamond indexes, i.e., we write 2, 3 for 2 1 , 3 1 .7. In all examples we report at the very end of each line, i.e. after each clause, the number of the Km2SAT encoding rule applied to generate that clause.We also drop the application of the rules (2), (3), ( 4) and (5).
Preprocessing steps, which are applied on the input modal formula before the encoding.Among them, we have Pre-conversion into BNF (Section 4.1), Atom Normalization (Section 4.2), Box Lifting (Section 4.3), and Controlled Box Lifting (Section 4.4).
On-the-fly simplification steps, which are applied to the Boolean formula under construction.Among them, we have On-the-fly Boolean Simplification and Truth Propagation Through Boolean Operators (Section 4.5) and Truth Propagation Through Modal Operators (Section 4.6), On-the-fly Pure-Literal Reduction (Section 4.7), and On-the-fly Boolean Constraint Propagation (Section 4.8).
We analyze these techniques in detail.
With NNF, instead, the negative occurrence ¬2 r ψ is rewritten into 3 r (nnf (¬ψ)), so that two distinct Boolean atoms A σ, 2 r (nnf (ψ)) and A σ, 3 r (nnf (¬ψ)) are generated; DPLL can assign them the same truth value, creating a hidden conflict which may require some extra Boolean search to reveal. 8 Example 4.1 (BNF).We consider the BNF variant of the ϕ nnf formula of Example 3.1, Unlike with the NNF formula ϕ nnf in Example 3.1, K m 2SAT (ϕ bnf ) is found unsatisfiable directly by BCP.In fact, the unit-propagation of A 1, 2¬A 1 from 2. causes ¬A 1, 2¬A 1 in 8. Notice that this consideration holds for every representation involving both boxes and diamonds; we refer to NNF simply because it is the most popular of these representations.9. Notice that the valid clause 6. can be dropped.See the explanation in Section 4.5.
3. to be false, so that one of the two (unsatisfiable) branches induced by the disjunction is cut a priori.With ϕ nnf , K m 2SAT does not recognize 2¬A 1 and 3A 1 to be one the negation of the other, so that two distinct atoms A 1, 2¬A 1 and A 1, 3A 1 are generated.Hence A 1, 2¬A 1 and A 1, 3A 1 cannot be recognized by DPLL to be one the negation of the other, s.t.DPLL may need exploring one Boolean branch more. 3 In the following we will assume the formulas are in BNF (although most of the optimizations which follow work also for other representations).

Normalization of Modal Atoms
One potential source of inefficiency for DPLL-based procedures is the occurrence in the input formula of semantically-equivalent though syntactically-different modal atoms ψ and ψ (e.g., 2 1 (A 1 ∨ A 2 ) and 2 1 (A 2 ∨ A 1 )), which are not recognized as such by K m 2SAT .This causes the introduction of duplicated Boolean atoms A σ, ψ and A σ, ψ and -much worse-of duplicated subformulas Def (σ, ψ ) and Def (σ, ψ ).This fact can have very negative consequences, in particular when ψ and ψ occur with negative polarity, because this causes the creation of distinct versions of the same successor states, and the duplication of whole parts of the output formula.
. The latter will cause the creation of two distinct states 1.1 and 1.2.Thus, the recursive expansion of all 2 1 -formulas occurring positively in φ 1 , φ 2 , φ 3 will be duplicated for these two states. 3 In order to cope with this problem, as done by Giunchiglia and Sebastiani (1996), we apply some normalization steps to modal atoms with the intent of rewriting as many as possible syntactically-different but semantically-equivalent modal atoms into syntacticallyidentical ones.This can be achieved by a recursive application of some simple validitypreserving rewriting rules.
Sorting: modal atoms are internally sorted according to some criterion, so that atoms which are identical modulo reordering are rewritten into the same atom (e.g., Flattening: the associativity of ∧ and ∨ is exploited and combinations of ∧'s or ∨'s are "flattened" into n-ary ∧'s or ∨'s respectively (e.g., Flattening has also the advantage of reducing the number of novel atoms introduced in the encoding, as a consequence of the fact noticed in Section 3. One possible drawback of this technique is that it can reduce the sharing of subformulas (e.g., with ), the common part is no more shared).However, we have empirically experienced that this drawback is negligible wrt. the advantages of flattening.

Box Lifting
As second preprocessing the K m -formula can also be rewritten by recursively applying the K m -validity-preserving "box lifting rules": This has the potential benefit of reducing the number of π r formulas, and hence the number of labels σ.i to take into account in the expansion of the Def (σ, ν r )'s (9).We call lifting this preprocessing.

Controlled Box Lifting
One potential drawback of applying the lifting rules is that, by collapsing the formula , the possibility of sharing box subformulas in the DAG representation of the input K m -formula is reduced.
In order to cope with this problem we provide an alternative policy for applying box lifting, that is, to apply the rules (10) only when neither box subformula occurring in the implicant in (10) has multiple occurrences.We call this policy controlled box lifting.
Example 4.4 (Controlled Box Lifting).We apply Controlled Box Lifting to the formula of Example 4.1, then we have since the rules (10) are applied among all the box subformulas except for 2¬A 1 , which is ) is found unsatisfiable directly by BCP on clauses 1., 2. and 3..Notice that the unit propagation of A 1, 2¬A 1 and A 1, 2(¬A 2 ∧¬A 3 ) from 2. causes the implicate disjunction in 3. to be false.3

On-the-fly Boolean Simplification and Truth Propagation
A first straightforward on-the-fly optimization is that of applying recursively the standard rewriting rules for the Boolean simplification of the formula like, e.g., and for the propagation of truth/falsehood through Boolean operators like, e.g., 3 One important subcase of on-the-fly Boolean simplification avoids the useless encoding of incompatible π r and ν r formulas.In BNF, in fact, the same subformula 2 r ψ may occur in the same state σ both positively and negatively (like π r = ¬2 r ψ and ν r = 2 r ψ).If so, K m 2SAT labels both those occurrences of 2 r ψ with the same Boolean atom A σ, 2 r ψ , and produces recursively two distinct subsets of clauses in the encoding, by applying (8) to ¬2 r ψ and (9) to 2 r ψ respectively.However, the latter step (9) generates a valid clause (A σ, 2rψ ∧ ¬A σ, 2rψ ) → A σ.i, ψ , so that we can avoid generating it.Consequently, if A σ.i, ψ no more occurs in the formula, then Def (σ.i, ψ) should not be generated, as there is no more need of defining σ.i, ψ .11Example 4.6.If we apply this observation in the construction of the formulas of Examples 4.1 and 4.4, we have the following facts: • In the formula K m 2SAT (ϕ bnf ) of Example 4.1, clause 6. is valid and thus it is dropped.
• In the formula K m 2SAT (ϕ bnf clift ) of Example 4.4, both valid clauses 6. and 9. are dropped, so that 12. is not generated.3 Hereafter we assume that on-the-fly Boolean simplification is applied also in combination with the techniques described in the next sections.

On-the-fly Truth Propagation Through Modal Operators
Truth and falsehood -which can derive by the application of the techniques in Section 4.5, Section 4.7 and Section 4.8-may be propagated on-the-fly also though modal operators.First, for every σ, both positive and negative instances of σ, 2 r can be safely simplified by applying the rewriting rule σ, 2 r =⇒ σ, .Second, we notice the following fact.When we have a positive occurrence of σ, ¬2 r ⊥ for some σ (we suppose wlog.that we have only that π r -formula for σ),12 by definition of ( 8) and ( 9) we have for some new label σ.j and for every 2 r ψ occurring positively in σ.Def (σ, ¬2 r ⊥) reduces to because both A σ.j, and Def (σ.j, ) reduce to .If at least another distinct πformula ¬2 r ϕ occurs positively in σ, however, there is no need for the σ.j label in ( 11) and ( 12) to be a new label, and we can re-use instead the label σ.i introduced in the expansion of Def (σ, ¬2 r ϕ), as follows: Thus ( 11) is dropped and, for every σ, 2 r ψ occurring positively, we write: instead of ( 12).(Notice the label σ.i introduced in (13) rather than the label σ.j of (11).)This is motivated by the fact that Def (σ, ¬2 r ⊥) forces the existence of at least one successor of σ but imposes no constraints on which formulas should hold there, so that we can use some other already-defined successor state, if any.This fact has the important benefit of eliminating useless successor states from the encoding.
Example 4.7.Let ϕ be the BNF K-formula: Clause 11. is then simplified into .(In a practical implementation it is not even generated.)Notice that in clauses 11., 12. and 13. it is used the label 1.1 of clauses 8., 9. and 10. rather than a new label 1.2.Thus, only one successor label is generated.
When DPLL is run on K m 2SAT (ϕ), by BCP 1. and 2. are immediately satisfied and the implicants are removed from 3., 4., 5., 6.. Thanks to 5. and 6., A 1, A 1 can be assigned only to false, which causes 3. to be satisfied and forces the assignment of the literals ¬A 1, 2⊥ , A 1, 2¬A 4 by BCP on 3. and 7. and hence of ¬A 1.1, ⊥ , ¬A 1.1, A 4 and A 1.1, A 4 by BCP on 12. and 13., causing a contradiction. 3 It is worth noticing that ( 14) is strictly necessary for the correctness of the encoding even when another π-formula occurs in σ. (E.g., in Example 4.7, without 12. and 13. the formula K m 2SAT (ϕ) would become satisfiable because A 1, 2A 2 could be safely be assigned to true by DPLL, which would satisfy 8., 9. and 10..) Hereafter we assume that this technique is applied also in combination with the techniques described in Section 4.5 and in the next sections.

On-the-fly Pure-Literal Reduction
Another technique, evolved from that proposed by Pan and Vardi (2003), applies Pure-Literal Reduction (PLR) on-the-fly during the construction of K m 2SAT (ϕ).When for a label σ all the clauses containing atoms in the form A σ, ψ have been generated, if some of them occurs only positively [resp.negatively], then it can be safely assigned to true [resp.
to false], and hence the clauses containing A σ, ψ can be dropped. 13As a consequence, some other atom A σ, ψ can become pure, so that the process is repeated until a fixpoint is reached.
Example 4.8.Consider the formula ϕ bnf of Example 4.1.During the construction of K m 2SAT (ϕ bnf ), after 1.-8.are generated, no more clause containing atoms in the form A 1.1, ψ is to be generated.Then we notice that A 1.1, A 2 and A 1.1, A 3 occur only negatively, so that they can be safely assigned to false.Therefore, 7. and 8. can be safely dropped.Same discourse applies lately to A 1.2, A 1 and 9..The resulting formula is found inconsistent by BCP.(In fact, notice from Example 4.1 that the atoms A 1.1, A 2 , A 1.1, A 3 , and A 1.2, A 1 play no role in the unsatisfiability of K m 2SAT (ϕ bnf ).) 3 We remark the differences between PLR and the Pure-Literal Reduction technique proposed by Pan and Vardi (2003).In KBDD (Pan et al., 2002;Pan & Vardi, 2003), the Pure-Literal Reduction is a preprocessing step which is applied to the input modal formula, either at global level (i.e.looking for pure-polarity primitive propositions for the whole formula) or, more effectively, at different modal depths (i.e.looking for pure-polarity primitive propositions for the subformulas at the same nesting level of modal operators).
Our technique is much more fine-grained, as PLR is applied on-the-fly with a single-state granularity, obtaining a much stronger reduction effect.
Example 4.9.Consider again the BNF K m -formula ϕ bnf discussed in Examples 4.1 and 4.8: It is immediate to see that all primitive propositions A 1 , A 2 , A 3 occur at every modal depth with both polarities, so that the technique of Pan and Vardi (2003) produces no effect on this formula.3

On-the-fly Boolean Constraint Propagation
One major problem of the basic encoding of Section 3 is that it is "purely-syntactic", that is, it does not consider the possible truth values of the subformulas, and the effect of their propagation through the Boolean and modal connectives.In particular, K m 2SAT applies (8) [resp.( 9)] to every π-subformula [resp.ν-subformula], regardless the fact that the truth values which can be deterministically assigned to the labeled subformulas of 1, ϕ may allow for dropping some labeled π-/ν-subformulas, and thus prevent the need of encoding them.
One solution to this problem is that of applying Boolean Constraint Propagation (BCP) on-the-fly during the construction of K m 2SAT (ϕ), starting from the fact that A 1, ϕ must be true.If a contradiction is found, then K m 2SAT (ϕ) is unsatisfiable, so that the formula is not expanded any further, and the encoder returns the formula "⊥". 14When BCP allows for dropping one implication in ( 6)-( 9) without assigning some of its implicate literals, namely L σ, ψ i , then σ, ψ i needs not to be defined, so that Def (σ, ψ i ) must not be expanded. 15Importantly, dropping Def (σ, π r,j ) for some π-formula σ, π r,j prevents generating the label σ.j (8) and all its successor labels σ.j.σ (corresponding to the subtree of states rooted in σ.j), so that all the corresponding labeled subformulas are not encoded.
14.For the sake of compatibility with standard SAT solvers, our actual implementation returns the formula A 1 ∧ ¬A 1 .15.Here we make the same consideration as in Footnote 11: if L σ.j, ψ is generated also from the expansion of some other subformula, (e.g., 2 r (ψ ∨ φ)), then (another instance of) Def (σ.i, ψ) must be generated anyway.
Example 4.10.Consider Example 4.1, and suppose we apply on-the-fly BCP.During the construction of 1., 2. and 3. in and A 1, 2¬A 3 are deterministically assigned to true by BCP.This causes the removal from 3. of the first-implied disjunct ¬A 1, 2¬A 1 , so that there is no need to generate Def (1, ¬2¬A 1 ), and hence label 1.1.is not defined and 4. is not generated.While building 5., A 1.2, (¬A 2 ∧¬A 3 ) , is unit-propagated.As label 1.1.is not defined, 6., 7. and 8. are not generated.Then during the construction of 5., 9., 10., 11. and 12., by applying BCP a contradiction is found, so that K m 2SAT (ϕ) is ⊥.Among all optimizations described in this Section 4, on-the-fly BCP is by far the most effective.In order to better understand this fact, we consider as a paradigmatic example the branching formulas ϕ K h by Halpern andMoses (1992, 1995) (also called "k branch n" in the set of benchmark formulas proposed by Heuerding and Schwendimann, 1996) and their unsatisfiable version (called "k branch p" in the above-mentioned benchmark suite).
Given a single modality 2, an integer parameter h, and the primitive propositions D 0 , . . ., D h+1 , P 1 , . . ., P h , the formulas ϕ K h are defined as follows:16 A conjunction of the formulas depth, determined and branching is repeated at every nesting level of modal operators (i.e. at every depth): depth captures the relation between the D i 's at every level; determined states that, if P i is true [false] in a state at depth ≥ i, then it is true [false] in all the successor states of depth ≥ i; branching states that, for every node at depth i, it is possible to find two successor states at depth i + 1 such that P i+1 is true in one and false in the other.For each value of the parameter h, ϕ K h is K-satisfiable, and every Kripke model M that satisfies it has at least 2 h+1 − 1 states.In fact, ϕ K h is build in such a way to force the construction of a binary-tree Kripke model of depth h + 1, each of whose leaves encodes a distinct truth assignment to the primitive propositions P 1 , . . ., P h , whilst each D i is true in all and only the states occurring at a depth ≥ i in the tree (and thus denotes the level of nesting).
The unsatisfiable counterpart formulas proposed by Heuerding and Schwendimann (1996) (whose negations are the valid formulas called k branch p in the previously-mentioned benchmark suite, which are exposed in more details in Section 5.1.1)are obtained by conjoining to (15) the formula: (where x is the integer part of x) which forces the atom P h 3 +1 to be true in all depth-h states of the candidate Kripke model, which is incompatible with the fact that the remaining specifications say that it has to be false in half depth-h states. 17 These formulas are very pathological for many approaches (Giunchiglia & Sebastiani, 2000;Giunchiglia, Giunchiglia, Sebastiani, & Tacchella, 2000;Horrocks et al., 2000).In particular, before introducing on-the-fly BCP, they used to be the pet hate of our K m 2SAT approach, as they caused the generation of huge Boolean formulas.In fact, due to branching (18), ϕ K h contains 2h 3-formulas (i.e., π-formulas) at every depth.Therefore, the K m 2SAT encoder of Section 3 has to consider 1 + 2h + (2h) 2 + ... + (2h) h+1 = ((2h) h+2 − 1)/(2h − 1) distinct labels, which is about h h+1 times the number of those labeling the states which are actually needed.(None of the optimizations of Sections 4.1-4.7 is of any help with these formulas, because neither BNF encoding nor atom normalization causes any sharing of subformulas, the formulas are already in lifted form, and no literal occurs pure. 18) This pathological behavior can be mostly overcome by applying on-the-fly-BCP, because some truth values can be deterministically assigned to some subformulas of ϕ K h by on-thefly-BCP, which prevent encoding some or even most 2/3-subformulas.
In fact, consider the branching and determined formulas occurring in ϕ K h at a generic depth d ∈ {0...h}, which determine the states at level d in the tree.As in these states D 0 , ..., D d are forced to be true and D d+1 , ..., D h+1 are forced to be false, then all but the d-th conjunct in branching (all conjuncts if d = h) are forced to be true and thus they could be dropped.Therefore, only 2 3-formulas per non-leaf level could be considered instead, causing the generation of 2 h+1 − 1 labels overall.Similarly, in all states at level d the last h − d conjuncts in determined are forced to be true and could be dropped, reducing significantly the number of 2-formulas to be considered.
It is easy to see that this is exactly what happens by applying on-the-fly-BCP.In fact, suppose that the construction of K m 2SAT (ϕ K h ) has reached depth d (that is, the point where for every state σ at level d, the Def (σ, α)'s and Def (σ, β)'s are expanded but no Def (σ, π) and Def (σ, ν) is expanded yet).Then, BCP deterministically assigns true to the literals L σ, D 0 , ..., L σ, D d and false to L σ, D d+1 , ..., L σ, D h+1 , which removes all but one conjuncts in branching, so that only two Def (σ, π)'s out of 2h ones are actually expanded; similarly, the last h − d conjuncts in determined are removed, so that the corresponding Def (σ, ν)'s are not expanded.

Heuerding and Schwendimann do not explain the choice of the index " h
3 + 1".We understand that also other choices would have done the job.18.More precisely, only one literal, ¬D h+1 , occurs pure in branching, but assigning it plays no role in simplifying the formula.
), formulas (unsatisfiable).Left: number of Boolean variables; center: number of clauses; right: total CPU time requested to encoding+solving (where the solving step has been performed through Rsat).See Section 5 for more technical details.

As far as the unsatisfiable version K
) is concerned, when the expansion reaches depth h, thanks to (19), L σ, P h 3 +1 is generated and deterministically assigned to true by BCP for every depth-h label σ; thanks to determined and branching, BCP assigns all literals L σ, P 1 , ..., L σ, P h deterministically, so that L σ, P h 3 +1 is assigned to false for 50% of the depth-h labels σ.This causes a contradiction, so that the encoder stops the expansion and returns ⊥.
Figure 2 shows the growth in size and the CPU time required to encode and solve ) (2nd row) wrt. the parameter h, for eight combinations of the following options of the encoder: with and without box-lifting, with and without on-the-fly PLR, with and without on-the-fly BCP.(Notice the log scale of the y axis.)In Figure 2(d) the plots of the four versions "-xxx-bcp" (with on-the-fly BCP) coincide with the line of value 1 (i.e, one variable) and in Figure 2(e) they coincide with an horizontal line of value 2 (i.e, two clauses), corresponding to the fact that the 1-variable/2-clause formula A 1 ∧ ¬A 1 is returned (see Footnote 14).
We notice a few facts.First, for both formulas, the eight plots always collapse into two groups of overlapping plots, representing the four variants with and without on-the-fly BCP respectively.This shows that box-lifting and on-the-fly PLR are almost irrelevant in the encoding of these formulas, causing just little variations in the time required by the encoder (Figures 2(c) and 2(f)); notice that enabling on-the-fly PLR alone permits to encode (but not to solve) only one problem more wrt.the versions without both on-the-fly PLR and BCP.Second, the four versions with on-the-fly-BCP always outperform of several orders magnitude these without this option, in terms of both size of encoded formulas and of CPU time required to encode and solve them.In particular, in the case of the unsatisfiable variant (Figure 2, second row) the encoder returns the ⊥ formula, so that no actual work is required to the SAT solver (the plot of Figure 2(f) refers only to encoding time).

Empirical Evaluation
In order to verify empirically the effectiveness of this approach, we have performed a veryextensive empirical test session on about 14,000 K m /ALC formulas.We have implemented the encoder K m 2SAT in C++, with some flags corresponding to the optimizations exposed in the previous section: (i) NNF/BNF, performing a pre-conversion into NNF/BNF before the encoding; (ii) lift/ctrl.lift/nolift,performing respectively Box Lifting, Controlled Box Lifting or no Box Lifting before the encoding; (iii) plr if on-the-fly Pure Literal Reduction is performed and (iv) bcp if on-the-fly Boolean Constraint Propagation is performed.The techniques introduced in Section 4.2, Section 4.5 and Section 4.6 are hardwired in the encoder.Moreover, as pre-conversion into BNF almost always produces smaller formulas than NNF, we have set the BNF flag as a default.
19.In the preliminary evaluation of the available SAT solvers we have also tried SAT-Elite as a preprocessor to reduce the size of the SAT formula generated by K m 2SAT without the bcp option before to solve it.However, even if the preprocessing can signinificantly reduce the size of the formula, it has turned out After a preliminary evaluation and further intensive experiments we have selected Rsat 1.03 (Pipatsrisawat & Darwiche, 2006), because it produced the best overall performances on our benchmark suites (although the performance gaps wrt.other SAT tools, e.g.MiniSat 2.0, were not dramatic).
We have downloaded the available versions of the state-of-the-art tools for K m -satisfiability.After an empirical evaluation 20 we have selected Racer 1-7-24 (Haarslev & Moeller, 2001) and *SAT 1.3 (Tacchella, 1999) as the best representatives of the tableaux/DPLL-based tools, Mspass v 1.0.0t.1.3(Hustadt & Schmidt, 1999;Hustadt et al., 1999) 21 as the best representative of the FOL-encoding approach, KBDD (unique version) (Pan et al., 2002;Pan & Vardi, 2003) 22 as the representative of the automata-theoretic approach.No representative of the CSP-based and of the inverse method approaches could be used. 23 Notice that all these tools but Racer are experimental tools, as far as K m 2SAT which is a prototype, and many of them (e.g.*SAT and KBDD) are no longer maintained.
Finally, as representative of the QBF-encoding approach, we have selected the K-QBF translator (Pan & Vardi, 2003) combined with the sKizzo version 0.8.2 QBF solver (Benedetti, 2005), which turned out to be by far 24 the best QBF solver on our benchmarks among the freely-available QBF solvers from the QBF2006 competition (Narizzano, Pulina, & Tacchella, 2006).(In our evaluation we have considered the tools : 2clsQ, SQBF, preQuantor-i.e.preQuel +Quantor-Quantor 2.11, and Semprop 010604.) All tests presented in this section have been performed on a two-processor Intel Xeon 3.0GHz computer, with 1 MByte Cache each processor, 4 GByte RAM, with Red Hat Linux 3.0 Enterprise Server, where four processes can run in parallel.When reporting the results for one K m 2SAT +Rsat version, the CPU times reported are the sums of both that this preprocessing is too time-expensive and that the overall time spent for preprocessing and then solving the reduced problem is higher than that solving directly the original encoded SAT formula.20.As we did for the selection of the SAT solver, in order to select the tools to be used in the empirical evaluation, we have performed a preliminary evaluation on the smaller benchmark suites (i.e. the LWB and, sometimes, the TANCS 2000 ones; see later).Importantly, from this preliminary evaluation Racer turned out to be definitely more efficient than FaCT++, being able to solve more problems in less time.Also, in order to meet the reviewers' suggestions, we repeated this preliminary evaluation with the latest versions of FaCT++ (v1.2.3, March 5th, 2009) and the same version of Racer used in this paper.
In this evaluation Racer solves ten more problems than FaCT++ on the LWB benchmark, and over than one hundred of problems more than FaCT++ on the whole TANCS 2000 suite.Also on 2 m -CNF random problems Racer outperforms FaCT++.(We include in the online appendix the plots of this comparison between Racer and FaCT++.)21.We have run Mspass with the options -EMLTranslation=2 -EMLFuncNary=1 -Sorts=0 -CNFOptSkolem=0 -CNFStrSkolem=0 -Select=2 -Split=-1 -DocProof=0 -PProblem=0 -PKept=0 -PGiven=0, which are suggested for K m -formulas in the Mspass README file.We have also tried other options, but the former gave the best performances.22. KBDD has been recompiled to be run with an increased internal memory bound of 1 GB.23.At the moment K K is not freely available, and we failed in the attempt of obtaining it from the authors.
KCSP is a prolog piece of software, which is difficult to compare in performances wrt.other optimized tools on a common platform; moreover, KCSP is no more maintained since 2005, and it is not competitive wrt.state-of-the-art tools (Brand, 2008).Other tools like leanK, 2KE, LWB, Kris are not competitive with the ones listed above (Horrocks et al., 2000).KSAT (Giunchiglia & Sebastiani, 1996, 2000;Giunchiglia et al., 2000) has been reimplemented into *SAT.24.Unlike with the choice of SAT solver, the performance gaps from the best choice and the others were very significant: e.g., in the LWB benchmark (see later), sKizzo was able to solve nearly 90 problems more than its best QBF competitor.
the encoding and Rsat solving times.When reporting the results for K-QBF +sKizzo, the CPU times reported are only due to sKizzo because the time spent by the K-QBF converter is negligible.We anticipate that, for all formulas of all benchmark suites, all tools under test -i.e.all the variants of K m 2SAT +Rsat and all the state-of-the-art K m -satisfiability solversagreed on the satisfiability/unsatisfiability result when terminating within the timeout.
Remark 1. Due to the big number of empirical tests performed and to the huge amount of data plotted, and due to limitations in size, and in order to to make the plots clearly distinguishable in the figures, we have limited the number of plots included in the following part of the paper, considering only the most meaningful ones and those regarding the most challenging benchmark problems faced.For the sake of the reader's convenience, however, full-size versions of all plots and many other plots regarding the not-exposed results (also on the easier problems), are available in the online appendix, together with the files with all data.When discussing the empirical evaluation we may include in our considerations also these results.

Test Description
We have performed our empirical evaluation on three different well-known benchmarks suites of K m /ALC problems: the LWB (Heuerding & Schwendimann, 1996), the random 2 m -CNF (Horrocks et al., 2000;Patel-Schneider & Sebastiani, 2003) and the TANCS 2000 (Massacci & Donini, 2000) benchmark suites.We are not aware of any other publiclyavailable benchmark suite on K m /ALC-satisfiability from the literature.These three groups of benchmark formulas allow us to test the effectiveness of our approach on a large number of problems of various sizes, depths, hardness and characteristics, for a total amount of about 14,000 formulas.
In particular, these benchmark formulas allow us to fairly evaluate the different tools both on the modal component and on the Boolean component of reasoning which are intrinsic in the K m -satisfiability problem, as we discuss later in Section 5.4.
In the following we describe these three benchmark suites.

The LWB Benchmark Suite
As a first group of benchmark formulas we used the LWB benchmark suite used in a comparison at Tableaux'98 (Heuerding & Schwendimann, 1996).It consists of 9 classes of parametrized formulas (each in two versions, provable " p" or not-provable " n"25 ), for a total amount of 378 formulas.The parameter allows for creating formulas of increasing size and difficulty.The benchmark methodology is to test formulas from each class, in increasing difficulty, until one formula cannot be solved within a given timeout, 1000 seconds in our tests. 26he result from this class is the parameter's value of the largest (and hardest) formula that can be solved within the time limit.The parameter ranges only from 1 to 21 so that, if a system can solve all 21 instances of a class, the result is given as 21.For a discussion on this benchmark suite, we refer the reader to the work of Heuerding and Schwendimann (1996) and of Horrocks et al. (2000).

The Random 2 m -CNF Benchmark Suite
As a second group of benchmark formulas, we have selected the random 2 m -CNF testbed described by Horrocks et al. (2000), andPatel-Schneider andSebastiani (2003).This is a generalization of the well-known random k-SAT test methods, and is the final result of a long discussion in the communities of modal and description logics on how to to obtain significant and flawless random benchmarks for modal/description logics (Giunchiglia & Sebastiani, 1996;Hustadt & Schmidt, 1999;Giunchiglia et al., 2000;Horrocks et al., 2000;Patel-Schneider & Sebastiani, 2003).
In the 2 m -CNF test methodology, a 2 m -CNF formula is randomly generated according to the following parameters: • the (maximum) modal depth d; • the number of top-level clauses L; • the number of literal per clause clauses k; • the number of distinct propositional variables N ; (We refer the reader to the works of Horrocks et al., 2000, andPatel-Schneider &Sebastiani, 2003 for a more detailed description.)A typical problem set is characterized by fixed values of d, k, N , m, and p: L is varied in such a way as to empirically cover the "100% satisfiable / 100% unsatisfiable" transition.In other words, many problems with the same values of d, k, N, m, and p but an increasing number of clauses L are generated, starting from really small, typically satisfiable problems (i.e. with a probability of generating a satisfiable problem near to one) to huge problems, where the increasing interactions among the numerous clauses typically leads to unsatisfiable problems (i.e. it makes the probability of generating satisfiable problems converging to zero).Then, for each tuple of the five values in a problem set, a certain number of 2 m -CNF formulas are randomly generated, and the resulting formulas are given in the input to the procedure under test, with a maximum time bound.The fraction of formulas which were solved within a given timeout, and the median/percentile values of CPU times are plotted against the ratio L/N .Also, the fraction of satisfiable/unsatisfiable formulas is plotted for a better understanding.
In each test, we imposed a timeout of 500 seconds per sample 28 and we calculated the number of samples which were solved within the timeout, and the 50%th and 90%th percentiles of CPU time. 29In order to correlate the performances with the (un)satisfiability of the sample formulas, in the background of each plot we also plot the satisfiable/unsatisfiable ratio.

The TANCS 2000 Benchmark Suite
Finally, as a third group of benchmark formulas, we used the MODAL PSPACE division benchmark suite used in the comparison at TANCS 2000 (Massacci & Donini, 2000).It contains both satisfiable and unsatisfiable formulas, with scalable hardness.In this benchmark suite, which we call TANCS 2000, the formulas are constructed by translating QBF formulas into K using three translation schemas, namely the Schmidt-Schauss-Smolka translation (240 problems with many different depths, from 19 to 112), the Ladner translation (240 problems, again with depths in the same range 19 -112), and the Halpern translation (56 problems of depth among: 20, 28, 40, 56, 80 or 112) (Massacci & Donini, 2000).As done by Massacci and Donini, we call these classes easy, medium and hard respectively.
All formulas from each class are tested within a timeout of 1000 seconds. 30For each class, we report the number of solved formulas (X axis) and the total (cumulative) CPU time spent for solving these formulas (Y axes).For each class the results are plotted sorting the solved problems from the easiest one to the hardest one.

An Empirical Comparison of the Different Variants of K m 2SAT
We have first evaluated the various variants of the encoding in combination with Rsat.In order to avoid considering too many combinations of the flags, we have considered the BNF format, and we have grouped plr and bcp into one parameter plr-bcp, restricting thus our investigation to 6 combinations: BNF, lift/ctrl.lift/nolift,and plr-bcp on/off.(We recall that the techniques introduced in Section 4.2, Section 4.5 and Section 4.6 are hardwired in the encoder.)Here we expose and analyze the results wrt. the three different suites of benchmark problems.
28.With also a 512 MB file-size limit for the encoding produced by Km2SAT .29. Due to the lack of space and for the sake of clarity we won't include in the paper the 90%th percentiles plots.Further, for the same reasons, we'll skip to report the plots regarding some of the easiest class of the benchmark suite (e.g.those with d = 1 and lower values of N ).All of these plots, however, can be found in the online appendix.30.We also set a 1 GB file-size limit for the encoding produced by Km2SAT .

Results on the LWB Benchmark Suite
The results on the LWB benchmark suite are summarized in Table 1 and Figure 3.
Table 1(a) reports in the left block the indexes of the hardest formulas encoded within the file-size limit and, in the right block, those of the hardest formulas solved within the timeout by Rsat; Table 1(b) reports the numbers of variables and clauses of K m 2SAT (ϕ), referring to the hardest formulas solved within the timeout by Rsat (i.e., those reported in the right block of Table 1(a)).For instance, the BNF-ctrl.lift-plr-bcpencoding of k dum n(21) contains 11•10 6 variables and 14•10 6 clauses; it is the hardest k dum n problem solved by Rsat with BNF-ctrl.lift-plr-bcpand it is the first which is not solved with BNF-ctrl.lift.
Looking at the numbers of cases solved in Table 1(a), we notice that the introduction of the on-the-fly Pure Literal Reduction and Boolean Constraint Propagation optimizations is really effective and produces a consistent performance enhancement (the effect of these optimizations is eye-catching in the branching formulas k branch * -see Section 4.9 -and in the k path * formulas).We also notice that lift sometimes introduces some slight further improvement.
The view of Tables 1(a) and 1(b) hides the actual CPU times required to encode and solve the problems.Small gaps in the numbers of Table 1(a) may correspond to big gaps in CPU time.In order to analyze also this aspect, in Figure 3 we plotted the total cumulative amount of CPU time spent by all the variants of K m 2SAT +Rsat to solve all the problems of the LWB benchmark, sorted by hardness.For this plot, we also considered three more options -BNF, lift/ctrl.lift/nolift,with plr on and bcp off-so that to evaluate also the effect of plr and bcp separately.We notice that the plots are clearly clustered into three groups of increasing performance: BNF-*, BNF-*-plr, and BNF-*-plr-bcp., "*" representing the three options lift/ctrl.lift/nolift.This highlights the fact that on this suite on-the-fly Pure Literal Reduction significantly improves the performances, that on-the-fly Boolean Constraint Propagation introduces drastic improvements, and that the variations due to Box Lifting are minor wrt. the other two optimizations.
Overall, the configuration BNF-lift-plr-bcp turns out to be the best performer on this suite, with a tiny advantage wrt.BNF-ctrl.lift-plr-bcp.

Results on the Random 2 m -CNF Benchmark Suite
The results on the random 2 m -CNF benchmark suite are reported in Figures 4 and 5.
In Figure 4 we report the 50%-percentile CPU times required to encode and solve the formulas by the different K m 2SAT +Rsat variants for the hardest benchmarks problems.We don't report the percentage of solved problems since it is always 100%, i.e.K m 2SAT +Rsat terminates within the timeout for every problem in the benchmark suite.
The tests with depth d = 1 (see the results on the hardest problems of the class in the first row of Figure 4) are simply too easy for K m 2SAT +Rsat (but not for its competitors, see Section 5.3) which solved every sample formula in less than 1 second.Although the tests exposed in the second and third row of Figure 4 are more challenging, they are all solved within the timeout as well.We have noticed also that the results are rather regular, since there are no big gaps between 50%-and 90%-percentile values.X axis: number of solved problems; Y axis: total CPU time spent (sorting problems from the easiest to the hardest).
In general, we do not have relevant performance gaps between the various configurations on this benchmark suite; it seems that in the majority of cases ctrl.liftslightly beats nolift and nolift slightly beats lift.These gaps are even more relevant if we consider the size of the formulas generated (Figure 5).We believe that this may be due to the fact that random 2 m -CNF formulas may contain lots of shared subformulas 2 r ψ, so that lifting may cause a reduction of such sharing (see Section 3).Further, plr-bcp does not seem to introduce relevant improvements here.We believe that this is due to the fact that these random formulas produce pure and unit literals with very low or even zero probability.
Overall, the configuration BNF-nolift turns out to be the best performer on this suite, with a slight advantage wrt.BNF-ctrl.lift-plr-bcp.
Finally, from some plots of Figure 4 we notice that for K m 2SAT +Rsat the problems tend to be harder within the satisfiability/unsatisfiability transition area.(This fact holds especially for Racer and *SAT, see Section 5.3.)This seems to confirm the fact that the easy-hard-easy pattern of random k-SAT extends also to 2 m -CNF, as already observed in literature (Giunchiglia & Sebastiani, 1996, 2000;Giunchiglia et al., 2000;Horrocks et al., 2000;Patel-Schneider & Sebastiani, 2003).

Results on the TANCS 2000 Benchmark Suite
The comparison among the K m 2SAT variants on the TANCS 2000 benchmark is presented in Figures 6 and 7, where different BNF variants of K m 2SAT are compared both enabling or disabling lift/ctrl.lifand plr-bcp.
In Figure 6, from top-left to bottom, we present the cumulative CPU times spent by K m 2SAT +Rsat on the easy, medium and hard categories respectively (the corresponding plots reporting the non-cumulative CPU times are included in the online appendix).In Figure 7 we present the plots of the number of variables and clauses of the formulas solved for the more challenging cases of the medium and hard problems. 31We notice that there are only slight differences among the different variants of K m 2SAT ; BNF with lift is the best option which allows for solving more problems in the hard class and requiring less CPU time.
We remark that, despite the expected exponential growth of the encoded formulas wrt. the modal depth, in this benchmark K m 2SAT +Rsat allows for encoding and solving problems of modal depth greater than 100 for the easy class and greater than 50 for the medium and hard classes, producing and solving SAT-encoded formulas with more than 10 7 variables and 1.4 • 10 7 clauses.

An Empirical Comparison wrt. the Other Approaches
We proceed with the comparison of our approach wrt. the current state-of-the-art evaluating K m 2SAT +Rsat against the other K m -satisfiability solvers listed above.In more details, we chose to compare the performance of the other solvers against both the best "local" K m 2SAT +Rsat variant for the single benchmark suite and the best "global" K m 2SAT +Rsat variant among all the benchmark suites, which we have identified in BNF-ctrl.lift-plr-bcp.

Comparison on the LWB Benchmark Suite
The results on the LWB benchmark suite are summarized numerically and graphically in Table 2. From Table 2(a) we notice a few facts: Racer and *SAT are the best performers (confirming the analysis done by Horrocks et al., 2000) with a significant gap wrt. the others; then, K-QBF +sKizzo solves a few more problems than K m 2SAT +Rsat; then follows KBDD which outperforms Mspass, which is the worst performer.In detail, K m 2SAT +Rsat is (one of) the worst performer(s) on k d4 * and k t4 *, the fourth best performer on k path n, the third best performer on k path p and k branch p, and it is (one of) the best performer(s) on k branch n, k dum *, k grz *, k lin *, k ph * and k poly *; it is the absolute best performer on k branch n and k ph p.
In Table 2(b) we give a graphical representation of this comparison, plotting the number of solved problems by each approach against the total cumulative amount of CPU time spent.We notice that, even if K m 2SAT +Rsat solves a few problems less than K-QBF +sKizzo, K m 2SAT +Rsat is mostly faster than K-QBF +sKizzo.

Comparison on the Random 2 m -CNF Benchmark Suite
In the random 2 m -CNF benchmark suite the results are dominated by K m 2SAT +Rsat.
For the hardest categories among the three groups of problems used in the evaluation, we report in Figure 8 the number of problems solved by each tool within the timeout, and in Figure 9 the median CPU time (i.e. the 50%th percentile).
Looking at Figure 8 we notice that K m 2SAT +Rsat (in both versions) is the only tool which always terminates within the timeout, whilst *SAT and Racer sometimes do not terminate in the hardest problems, K-QBF +sKizzo very often does not terminate, and Mspass and KBDD do not terminate for most values.
In Figure 9 we notice that K m 2SAT +Rsat is most often the best performer (in particular with the hardest problems), followed by *SAT and Racer.(This is even much more evident if we consider the 90%th percentile of CPU time, whose plots are included in the online appendix.)In all these tests, K-QBF +sKizzo, Mspass and KBDD are drastically outperformed by the others.

Comparison on the TANCS 2000 Benchmark Suite
The results of the TANCS 2000 benchmark are summarized in Figure 10, from the easy category (top-left) to the hard category (bottom).
From the plots we notice that the relative performances of the tools under test vary significantly with the category: Racer and *SAT are among the best performers in all categories; K-QBF +sKizzo behaves well on the easy and medium categories but solves very few problems on the hard one; KBDD behaves very well on the easy category, but solves very few problems in the medium and hard ones.Mspass is among the worst performers in all categories: in particular, it does not solve any problem in the hard category; finally, K m 2SAT +Rsat is the worst performer on the easy category, it outperforms all competitors but *SAT and Racer on the medium category, and competes head-to-head with both Racer and *SAT for the first position on the hard category: the "local-best" configuration BNF-lift beats both competitors; the "global-best" configuration BNF-ctrl.lift-prl-bcpsolves as many problems as Racer, with one-order-magnitude CPU-time performance gap, and two problems less than *SAT.
Notice that the classification of the benchmark problems in "easy", "medium" and "hard" given by Massacci and Donini (2000) is based on the translation schema used to produce every modal problem and refers to its "reasoning component", but it is not necessarily related to other components (like, e.g., the modal depth) which affect the size of our encoding and, hence, the efficiency of our approach.This may explain the fact, e.g., that the "easy" problems are not so easy for our approach, and viceversa.

Discussion
As highlighted by Giunchiglia et al. (2000), andHorrocks et al. (2000), the satisfiability problem in modal logics like K m is characterized by the alternation of two orthogonal components of reasoning: a Boolean component, performing Boolean reasoning within each state, and a modal component, generating the successor states of each state.The latter must cope with the fact that the candidate models may be up to exponentially big wrt.depth(ϕ), whilst the former must cope with the fact that there may be up to exponentially many candidate (sub)models to explore.In the K m 2SAT +DPLL approach the encoder has to handle the whole modal component (rules ( 8) and ( 9)), whilst the handling of the whole Boolean component is delegated to an external SAT solver.
From the results displayed in Section 5.3 we notice that the relative performances of the K m 2SAT +DPLL approach wrt.other state-of-the-art tools range from cases where K m 2SAT +Rsat is much less efficient than other state-of-the-art approaches (e.g., the k d4 and k t4p formulas) up to formulas where it is much more efficient (e.g., the k ph p or the 2 m -CNF formulas with d = 1).In the middle stands a large majority of formulas in which the approach competes well against the other state-of-the art tools; in particular, K m 2SAT +Rsat competes very well or even outperforms the other approaches based on translations into different formalisms (the translational approach, the automata-theoretic approach and the QBF-encoding approach).
A simple explanation of the former fact could be that the K m 2SAT +DPLL approach loses on problems with high modal depth, or where the modal component of reasoning dominates (like, e.g., the k d4 and k t4p formulas), and wins on problems where the Boolean component of reasoning dominates (like, e.g., the k ph n or the 2 m -CNF formulas with d = 1), and it is competitive for formulas in which both components are relevant.
We notice, however, that K m 2SAT +Rsat wins in the hard category of TANCS 2000 benchmarks, with modal depths greater than 50, and on k branch n, where the search is dominated by the modal component. 32In fact, we recall that reducing the Boolean component of reasoning may produce a reduction also of the modal reasoning effort, because it may reduce the number of successor states to analyze (e.g. 2007, 2007).Thus, e.g., techniques like on-the-fly BCP, although exploiting only purely-Boolean properties, may produce not only a drastic pruning of the Boolean search, but also a drastic reduction in the size of the model investigated, because they cut a priori the amount of successor states to expand.

Related Work and Research Trends
In the last fifteen years one main research line in description logic has focused on investigating increasingly expressive logics, with the goal of establishing the theoretical boundaries of decidability and of allowing for more expressive power in the languages defined (Baader, Calvanese, McGuinness, Nardi, & Patel-Schneider, 2003).Consequently, very expressivethough very hard-description logics have today notable application in the field of Semantic Web.For example, the SHOIN (D) logic (which has NExpTime complexity) captures the sub-language OWL DL of the full OWL (Web Ontology Language) language (Bechhofer, van Harmelen, Hendler, Horrocks, McGuinness, Patel-Schneider, & Stein, 2004), that is the recommended standard language for the semantic web proposed by the W3C consortium.
In contrast, the recent quest for tractable description logic-based languages arising from the field of bio-medical ontologies (e.g., Spackman, Campbell, & Cote, 1997;Sioutos, de Coronado, Haber, Hartel, Shaiu, & Wright, 2007;The Gene Ontology Consortium, 2000; 32.The k branch n formulas are very hard from the perspective of modal reasoning, because they require finding one model M with 2 d+1 −1 states (Halpern & Moses, 1992), but no Boolean reasoning within each state is really required (Giunchiglia et al., 2000;Horrocks et al., 2000): e.g., *SAT solves k branch n(d) with 2 d+1 − 1 calls to its embedded DPLL engine, one for each state of M, each call solved by BCP only.
Rector & Horrocks, 1997) has stimulated the opening of another research line on tractable description logics (also called lightweight description logics), which are suitable for reasoning on these very big bio-medical ontologies.In particular, Baader et al. (2005Baader et al. ( , 2006Baader et al. ( , 2007Baader et al. ( , 2008) ) have spent a considerable effort in the attempt of defining a small but maximal subset of logical constructors, expressive enough to cover the needs of these practical applications, but whose inference problems must remain tractable.For example, simple and tractable description logics like EL, EL + and EL ++ (Baader et al., 2005) are expressive enough to describe several important bio-medical ontologies such as SNoMed (Spackman et 1997), NCI (Sioutos et al., 2007), the Gene Ontology (The Gene Ontology Consortium, 2000) and the majority of Galen (Rector & Horrocks, 1997).
Reasoning on these ontologies represents not only an important application of lightweight description logics, but also a challenge due to the required efficiency and the huge dimensions of the ontologies.In this perspective, it is important to face efficiently not only the basic reasoning services (e.g., satisfiability, subsumption, queries) on logics like EL, EL + and EL ++ , but also more sophisticated reasoning problems like e.g., axiom pinpointing (Baader et al., 2007;Baader & Peñaloza, 2008) and logical difference between terminologies (Konev, Walther, & Wolter, 2008).

Conclusions and Future Work
In this paper we have explored the idea of encoding K m /ALC-satisfiability into SAT, so that to be handled by state-of-the-art SAT tools.We have showed that, despite the intrinsic risk of blowup in the size of the encoded formulas, the performances of this approach are comparable with those of current state-of-the-art tools on a rather extensive variety of empirical tests.Furthermore, we remark that our approach allows for directly benefitting "for free" from current and future enhancements in SAT solver technology.
We see many possible directions to explore in order to enhance and extend our approach.An important open research line is to explore the feasibility of SAT encodings for other and more expressive modal and description logics (e.g., whilst for logics like T m the extension should be straightforward, logics like S4 m , or more elaborated description logics than ALC, should be challenging) and for more complex form of reasoning (e.g., including TBoxes and ABoxes).
Our current investigation, however, focuses on the lightweight logics of Baader et al. (2005).We have investigated (and we are currently enhancing) an encoding of the main reasoning services in EL and EL + into Horn-SAT, which allows for reasoning efficiently on the (often huge) bio-medical ontologies mentioned in Section 6, and for handling the more sophisticated inference problems mentioned there (e.g., we currently handle axiom pinpointing), by exploiting some of the advanced functionalities which can be implemented on top of a modern SAT solver (Sebastiani & Vescovi, 2009).where µ A is a consistent truth assignment for the literals L σ, A i s.t.A i ∈ A and σ ∈ M.
By construction, for every L σ, ψ in K m 2SAT (ϕ), µ assigns L σ, ψ to true iff it assigns L σ, ψ to false and vice versa, so that µ is a consistent truth assignment.
33.We assume that µ M , µ πν and µ αβ are generated in order, so that µ αβ is generated recursively starting from µπν .Intuitively, µM assigns the literals L σ, ψ s.t.σ ∈ M in such a way to mimic M; µ M assigns the other literals in such a way to mimic the fact that no state outside those in M is generated (i.e., all L σ, π 's are assigned false and the L σ, ν 's, L σ, α 's, L σ, β 's are assigned consequently).
Notice that, if σ ∈ M, then σ.i ∈ M for every i.Thus, for every Def (σ, ψ) s.t.σ ∈ M, all atoms in the implication definition of Def (σ, ψ) are not assigned by µ M .
Second, we show by induction on the recursive structure of µ M that µ M satisfies the definition implications of all Def (σ, ψ)'s and Def (σ, ψ)'s s.t.σ ∈ M. Let σ ∈ M.
As a base step, by ( 29), µ πν satisfies the definition implications of all Def (σ, π r,i )'s and Def (σ, ν r )'s because it assigns false to all L σ, π r,i 's.Indeed, µ A assigns every literal of the type L σ, A i s.tA i ∈ A and σ ∈ M (notice that all the Def (σ, A i )'s definitions are trivially satisfied and don't contain any definition implications).

Figure 1 :
Figure 1: Schema of a modern SAT solver engine based on DPLL.

Figure 2 :
Figure 2: Empirical analysis of K m 2SAT on Halpern & Moses formulas wrt. the depth parameter h, for different options of the encoder.1st row:k branch n, corresponding to K m 2SAT (ϕ K h ), formulas (satisfiable); 2nd row: k branch p, corresponding to K m 2SAT (ϕ K h ∧ 2 h P h 3 +1), formulas (unsatisfiable).Left: number of Boolean variables; center: number of clauses; right: total CPU time requested to encoding+solving (where the solving step has been performed through Rsat).See Section 5 for more technical details.

•
the number of distinct box symbols m; • the percentage p of purely-propositional literals in clauses occurring at depth < d, s.t. each clause of length k contains on average p • k randomly-picked Boolean literals and k − p • k randomly-generated modal literals 2 r ψ, ¬2 r ψ. 27

Figure 3 :
Figure 3: Comparison of different variants of K m 2SAT +Rsat on the LWB problems.X axis: number of solved problems; Y axis: total CPU time spent (sorting problems from the easiest to the hardest).

Figure 6 :
Figure 6: Comparison among different variants of K m 2SAT +Rsat on TANCS 2000 problems.X axis: number of solved problems.Y axis: total cumulative CPU time spent.1st (top-left) to 3th (bottom) plot: easy, medium, hard problems.(Problems are sorted from the easiest to the hardest).

Figure 7 :
Figure 7: Comparison among different variants of K m 2SAT on TANCS 2000 problems.X axis: number of the harder solved problem.Y axis: 1st column: #variables in the SAT encoding of the problem; 2nd column: #clauses in the SAT encoding of the problem.1st to 2th row: medium, hard problems.
Figure 10: Comparison against other approaches on TANCS 2000 problems.X axis: number of solved problems.Y axis: total cumulative CPU time spent.1st (top-left) to 3th (bottom) plot: easy, medium, hard problems.(Problems are sorted from the easiest to the hardest).
Automated Reasoning in Modal and Description Logics via SAT Encoding An analogous situation happens with ϕ bnf lift in Example 4.3: while building 1. and 2. a contradiction is found by BCP, s.t.K m 2SAT returns ⊥ without expanding the formula any further.Same discourse holds for ϕ bnf clift in Example 4.4: while building 1., 2. and 3. a contradiction is found by BCP, s.t.K m 2SAT returns ⊥ without expanding the formula any further.3 4.9 A Paradigmatic Example: Halpern & Moses Branching Formulas.

Table 1 :
Comparison of the variants of K m 2SAT +Rsat on the LWB benchmarks.

Table 2 :
Comparison of K m 2SAT +Rsat against the other tools on the LWB benchmarks.