Coherent Integration of Databases by Abductive Logic Programming

We introduce an abductive method for a coherent integration of independent data-sources. The idea is to compute a list of data-facts that should be inserted to the amalgamated database or retracted from it in order to restore its consistency. This method is implemented by an abductive solver, called Asystem, that applies SLDNFA-resolution on a meta-theory that relates different, possibly contradicting, input databases. We also give a pure model-theoretic analysis of the possible ways to `recover' consistent data from an inconsistent database in terms of those models of the database that exhibit as minimal inconsistent information as reasonably possible. This allows us to characterize the `recovered databases' in terms of the `preferred' (i.e., most consistent) models of the theory. The outcome is an abductive-based application that is sound and complete with respect to a corresponding model-based, preferential semantics, and -- to the best of our knowledge -- is more expressive (thus more general) than any other implementation of coherent integration of databases.


Introduction
Complex reasoning tasks often have to integrate information that is coming from different sources.One of the major challenges with this respect is to compose contradicting sources of information such that what is obtained would properly reflect the combination of the datasources on one hand 1 , and would still be coherent (in terms of consistency) on the other hand.There are a number of different issues involved in this process, the most important of which are the following: 1. Unification of the different ontologies and/or database schemas, in order to get a fixed (global) schema, and a translation of the integrity constraints 2 of each database to the new ontology.
2. Unification of translated integrity constraints in a single global set of integrity constraints.This means, in particular, elimination of contradictions among the translated integrity constraints, and inclusion of any global integrity constraint that is imposed on the integration process.
3. Integration of databases w.r.t. the unified set of integrity constraints, computed according to the previous item.
Each one of the issues mentioned above has its own difficulties and challenges.For instance, the first issue is considered, e.g., by Ullman (2000) and Lenzerini (2001Lenzerini ( , 2002)), where questions such as how to express the relations between the 'global database schema' and the source (local) schemas, and how this influences query processing with respect to the global schema (Bertossi, Chomicki, Cortés, & Gutierrez, 2002), are dealt with3 .
The second issue above is concerned with the construction of a single, classically consistent, set of integrity constraints, applied on the integrated data.In database context, it is common to assume that such a set is pre-defined, and consists of global integrity constraints that are imposed on the integration process itself (Bertossi et al., 2002;Lenzerini, 2002).In such case there is no need to derive these constraints from the local databases.When different integrity constraints are specified in different local databases, it is required to integrate not only the database instances (as specified in issue 3 above), but also the integrity constraints themselves (issue 2).The reason for separating these two topics is that integrity constraints represent truths that should be valid in all situations, while a database instance exhibits an extensional truth, i.e., an actual situation.Consequently, the policy of resolving contradictions among integrity constraints is often different than the one that is applied on database facts, and often the former is applied before the latter.
Despite their different nature, both issues are based on some formalisms that maintain contradictions and allow to draw plausible conclusions from inconsistent situations.Roughly, there are two approaches to handle this problem: • Paraconsistent formalisms, in which the amalgamated data may remain inconsistent, but the set of conclusions implied by it is not explosive, i.e.: not every fact follows from an inconsistent database, and so the inference process does not become trivial in the presence of contradictions.Paraconsistent procedures for integrating data, like those of Subrahmanian (1994) and de Amo, Carnielli, and Marcos (2002), are often based on a paraconsistent reasoning systems, such as LFI (Carnielli & Marcos, 2001), annotated logics (Subrahmanian, 1990;Kifer & Lozinskii, 1992;Arenas, Bertossi, & Kifer, 2000), or other non-classical proof procedures (Priest, 1991;Arieli & Avron, 1996;Avron, 2002;Carnielli & Marcos, 2002) 4 .
In this paper we follow the latter approach, and consider abductive approaches that handle the third issue above, namely: coherent methods for integrating different data-sources (with the same ontology) w.r.t. a consistent set of integrity constraints 5 .The main difficulty in this process stems from the fact that even when each local database is consistent, the collective information of all the data-sources may not remain consistent anymore.In particular, facts that are specified in a particular database may violate some integrity constraints defined elsewhere, and so this data might contradict some elements in the unified set of integrity constraints.Moreover, as noted e.g. in (Lenzerini, 2001;Cali, Calvanese, De Giacomo, & Lenzerini, 2002), the ability to handle, in a plausible way, incomplete and inconsistent data, is an inherent property of any system for data integration with integrity constrains, no matter which integration phase is considered.Providing proper ways of gaining this property is a major concern here as well.
Our goal is therefore to find ways to properly 'repair' a combined (unified) database, and restore its consistency.For this, we consider a pure declarative representation of the composition of distributed data by a meta-theory, relating a number of different input databases (that may contradict each other) with a consistent output database.The underlying language of the theory is that of abductive logic programming (Kakas, Kowalski, & Toni, 1992;Denecker & Kakas, 2000).For reasoning with such theories we use an abductive system, called Asystem (Kakas, Van Nuffelen, & Denecker, 2001;Van Nuffelen & Kakas, 2001), which is an abductive solver implementing SLDNFA-resolution (Denecker & De Schreye, 1992, 1998).The composing system is implemented by abductive reasoning on the meta-theory.In the context of this work, we have extended this system with an optimizing component that allows us to compute preferred coherent ways to restore the consistency of a given database.The system that is obtained induces an operational semantics for database integration.In the sequel we also consider some model-theoretic aspects of the problem, and define a preferential semantics (Shoham, 1988) for it.According to this semantics, the repaired databases are characterized in terms of the preferred models (i.e., the most-consistent valuations) of the underlying theory.We relate these approaches by showing that the Asystem is sound and complete w.r.t. the model-based semantics.It is also noted that our framework supports reasoning with various types of special information, such as timestamps and source identification.Some implementation issues and experimen-5.In this sense, one may view this work as a method for restoring the consistency of a single inconsistent database.We prefer, however, to treat it as an integration process of multiple sources, since it also has some mediating capabilities, such as source identification, making priorities among different data-sources, etc. (see, e.g., Section 4.6).
tal results are discussed as well.
The rest of this paper is organized as follows: in the next section we formally define our goal, namely: a coherent way to integrate different data-sources.In Section 3 we set up a semantics for this goal in terms of a corresponding model theory.Then, in Section 4 we introduce our abductive-based application for database integration.This is the main section of this paper, in which we also describe how a given integration problem can be represented in terms of meta logic programs, show how to reason with these programs by abductive computational models, present some experimental results, consider proper ways of reasoning with several types of special data, and show that our application is sound and complete with respect to the model-based semantics, considered in Section 3. Section 5 contains an overview of some related works, and in Section 6 we conclude with some further remarks, open issues, and future work6 .

Coherent Integration of Databases
We begin with a formal definition of our goal.In this paper we assume that we have a first-order language L, based on a fixed database schema S, and a fixed domain D. Every element of D has a unique name.A database instance D consists of atoms in the language L that are instances of the schema S. As such, every instance D has a finite active domain, which is a subset of D.
Definition 1 A database is a pair (D, IC), where D is a database instance, and IC, the set of integrity constraints, is a finite and classically consistent set of formulae in L.
Given a database DB = (D, IC), we apply to it the closed word assumption, so only the facts that are explicitly mentioned in D are considered true.The underlying semantics corresponds, therefore, to minimal Herbrand interpretations.

Definition 2
The minimal Herbrand model H D of a database instance D is the model of D that assigns true to all the ground instances of atomic formulae in D, and false to all the other atoms.
There are different views on a database.One view is that it is a logic theory consisting of atoms and, implicitly, the closed world assumption (CWA) that indicates that all atoms not in the database are false.Another common view of a database is that it is a structure that consists of a certain domain and corresponding relations, representing the state of the world.Whenever there is a complete knowledge and all true atoms are represented in the database, both views coincide: the unique Herbrand model of the theory is the intended structure.However, in the context of independent data-sources, the assumption that each local database represents the state of the world is obviously false.However, we can still view a local database as an incomplete theory, and so treating a database as a theory rather than as a structure is more appropriate in our case.Our goal is to integrate n consistent local databases, DB i = (D i , IC i ) (i = 1, . . .n) to one consistent database that contains as much information as possible from the local databases.The idea, therefore, is to consider the union of the distributed data, and then to restore its consistency in such a way that as much information as possible will be preserved.
Notation 1 Let DB i = (D i , IC i ), i = 1, . . .n, and let I(IC 1 , . . ., IC n ) be a classically consistent set of integrity constraints.We denote: In the notation above, I is an operator that combines the integrity constraints and eliminates contradictions (see, e.g., Alferes, Leite, Pereira, & Quaresma, 2000;Alferes, Pereira, Przymusinska, & Przymusinski, 2002).As we have already noted, how to choose this operator and how to apply it on a specific database is beyond the scope of this paper.In cases that the union of all the integrity constraints is classically consistent, it makes sense to take I as the union operator.Global consistency of the integrity constraints is indeed a common assumption in the database literature (Arenas et al., 1999;Greco & Zumpano, 2000;Greco, Greco, & Zumpano, 2001;Bertossi et al., 2002;Konieczny & Pino Pérez, 2002;Lenzerini, 2002), but for the discussion here it is possible to take, instead of the union, any operator I for consistency restoration.
A key notion in database integration is the following: Intuitively, Insert is a set of elements that should be inserted into D and Retract is a set of elements that should be removed from D in order to have a consistent database.
Definition 6 A repaired database of DB = (D, IC) is a consistent database of the form (D ∪ Insert \ Retract , IC), where (Insert, Retract) is a repair of DB.
As there may be many ways to repair an inconsistent database 8 , it is often convenient to make preferences among the possible repairs, and consider only the most preferred ones.Below are two common preference criteria for preferring a repair (Insert, Retract) over a repair (Insert ′ , Retract ′ ): Definition 7 Let (Insert, Retract) and (Insert ′ , Retract ′ ) be two repairs of a given database.
In what follows we assume that the preference relation ≤ is a fixed pre-order that represents some preference criterion on the set of repairs (and we shall omit subscript notations in it whenever possible).We shall also assume that if (∅, ∅) is a valid repair, it is the ≤-least (i.e., the 'best') one.This corresponds to the intuition that a database should not be repaired unless it is inconsistent.
The set of all the ≤-preferred repairs of DB is denoted by !(DB, ≤).
Definition 9 A ≤-repaired database of DB is a repaired database of DB, constructed from a ≤-preferred repair of DB.The set of all the ≤-repaired databases is denoted by: Note that if DB is consistent and ≤ is a preference relation, then DB is the only ≤repaired database of itself (thus, there is nothing to repair in this case, as expected).
It is easy to see that Given n databases and a preference criterion ≤, our goal is therefore to compute the set R(U DB, ≤) of the ≤-repaired databases of the unified database, U DB (Notation 1).The reasoner may use different strategies to determine the consequences of this set.Among the common approaches are the skeptical (conservative) one, that it is based on a 'consensus' among all the elements of R(U DB, ≤) (see Arenas et al., 1999;Greco & Zumpano, 2000), a 'credulous' approach, in which entailments are determined by any element in R(U DB, ≤), an approach that is based on a 'majority vote' (Lin & Mendelzon, 1998;Konieczny & Pino Pérez, 2002), etc.In cases where processing time is a major consideration, one may want to speed-up the computations by considering any repaired database.In such cases it is sufficient to find an arbitrary element in the set R(U DB, ≤).
Below are some examples 9 of the integration process 10 .
Example 1 Consider a relation teaches of the schema (course name, teacher name), and an integrity constraint, stating that the same course cannot be taught by two different teachers: Consider now the following two databases: Clearly, the unified database DB 1 ∪ DB 2 is inconsistent.It has two preferred repairs, which are (∅, {teaches(c 2 , n 3 )}) and (∅, {teaches(c 2 , n 2 )}).The corresponding repaired databases are the following: Thus, e.g., teaches(c 1 , n 1 ) is true both in the conservative approach and the credulous approach to database integration, while the conclusion teaches(c 2 , n 2 ) is supported only by credulous reasoning.
Example 2 Consider databases with relations class and supply, of schemas (item, type) and (supplier, department, item), respectively.Let where IC = { ∀X∀Y ∀Z (supply(X, Y, Z)∧class(Z, t 1 ) → X = c 1 ) } states that only supplier 9. See, e.g., (Arenas et al., 1999;Greco & Zumpano, 2000;Bertossi & Schwind, 2002) for further discussions on these examples.10.In all the following examples we use set inclusion as the preference criterion, and take the operator I that combines integrity constraints (see Notation 1) to be the union operator.

Model-based Characterization of Repairs
In this section we set up a semantics for describing repairs and preferred repairs in terms of a corresponding model theory.This will allow us, in particular, to give an alternative description of preferred repairs, this time in terms of a preferential semantics for database theory.
As database semantics is usually defined in terms of two-valued (Herbrand) models (cf.Definition 2 and the discussion that proceeds it), it is natural to consider two-valued semantics first.We show that arbitrary repairs can be represented by two-valued models of the integrity constraints.When a database is inconsistent, then by definition, there is no two-valued interpretation which satisfies both its database instance and its integrity constraints.A standard way to cope with this type of inconsistencies is to move to multiplevalued semantics for reasoning with inconsistent and incomplete information (see, e.g., Subrahmanian, 1990Subrahmanian, , 1994;;Messing, 1997;Arieli & Avron, 1999;Arenas, Bertossi, & Kifer, 2000;de Amo, Carnielli, & Marcos, 2002).What we will show below, is that repairs can be characterized by three-valued models of the whole database, that is, of the database instance and the integrity constraints.Finally, we concentrate on the most preferred repairs, and show that a certain subset of the three-valued models can be used for characterizing ≤preferred repairs.
Definition 10 Given a valuation ν and a truth value x.Denote: The following two propositions characterize repairs in terms of two-valued structures.
Proposition 1 Let (D, IC) be a database and let M be a two-valued model of IC.Let Insert = M t \ D and Retract = D \ M t .Then (Insert, Retract) is a repair of (D, IC).Proof: Consider a valuation M , defined for every atom p as follows: When a database is inconsistent, it has no models that satisfy both its integrity constraints and its database instance.One common method to overcome such an inconsistency is to introduce additional truth-values that intuitively represent partial knowledge, different amounts of beliefs, etc. (see, e.g., Priest, 1989Priest, , 1991;;Subrahmanian, 1990;Fitting, 1991;Arieli, 1999;Arenas et al., 2000;Avron, 2002).Here we follow this guideline, and consider database integration in the context of a three-valued semantics.The benefit of this is that, as we show below, any database has some three-valued models, from which it is possible to pinpoint the inconsistent information, and accordingly construct repairs.
The underlying three-valued semantics considered here is induced by the algebraic structure T HREE, shown in the double-Hasse diagram of Figure 1  Viewed horizontally, T HREE is a complete lattice.According to this view, f is the minimal element, t is the maximal one, and ⊤ is an intermediate element.The corresponding order relation, ≤ t , intuitively represents differences in the amount of truth that each element exhibits.We denote the meet, join, and the order reversing operation on ≤ t by ∧, ∨, and ¬ (respectively).Viewed vertically, T HREE is a semi-upper lattice.In this view, ⊤ is the maximal element and the two 'classical values' are incomparable.This partial order, denoted by ≤ k , may be intuitively understood as representing differences in the amount of knowledge (or information) that each element represents 13 .We denote by ⊕ the join operation on ≤ k 14 .
Various semantic notions can be defined on T HREE as natural generalizations of similar classical ones: a valuation ν is a function that assigns a truth value in T HREE to each atomic formula.Given a valuation ν, truth values x i ∈ {t, f, ⊤}, and atomic formulae p i , we shall sometimes write ν = {p i : . Any valuation is extended to complex formulae in the obvious way.For instance, ν(¬ψ) = ¬ν(ψ), ν(ψ ∧ φ) = ν(ψ) ∧ ν(ψ), and so forth 15 .The set of the designated truth values in T HREE (i.e., those elements in T HREE that represent true assertions) consists of t and ⊤.A valuation ν satisfies a formula ψ iff ν(ψ) is designated.A valuation that assigns a designated value to every formula in a theory T is a (three-valued) model of T .
Proof: By induction on the structure of ψ. 2 We shall write ν ≥ k µ, if ν and µ are three-valued valuations, for which the condition of Lemma 1 holds.
Lemma 2 If ν ≥ k µ and µ is a model of some theory T , then ν is also a model of T .
Proof: For every formula ψ ∈ T , µ(ψ) is designated.Hence, by Lemma 1, for every formula ψ ∈ T ν(ψ) is also designated, and so ν is a model of T . 2 Next we characterize the repairs of a database DB by its three-valued models: Proposition 3 Let (D, IC) be a database and let M be a two-valued model of IC.Consider the three-valued valuation N , defined for every atom p by 1. N is a three-valued model of D ∪ IC, and 13.See (Belnap, 1977;Ginsberg, 1988;Fitting, 1991) for a more detailed discussion on these orders and their intuitive meaning.14.We follow here the notations of Fitting (1990Fitting ( , 1991)).15.As usual, we use here the same logical symbol to denote the connective that appear on the left-hand side of an equation, and the corresponding operator on T HREE that appear on the right-hand side of the same equation.
Proposition 3 shows that repairs of a database (D, IC) may be constructed in a standard (uniform) way by considering three-valued models that are the ≤ k -least upper bounds of two two-valued valuations: the minimal Herbrand model of the database instance, and a two-valued model of the integrity constraints.Proposition 4 below shows that any repair of (D, IC) is of this form.
Before we give a proof for Proposition 3, let's demonstrate it by a simple example.

Proof of Proposition 3: Since by the definition of
and M is a model of IC, thus by the same lemma N is also a model of IC.
For the second part, we observe that 2 Note that the specific form of the three-valued valuations considered in Proposition 3 is essential here, as the proposition does not hold for every three-valued model of D ∪ IC.
To see this consider, e.g., D = {}, IC = {p , p → q}, and a three valued valuation N that assigns ⊤ to p and t to q.Clearly, N is a model of D ∪ IC, but the corresponding update, (N ⊤ \ D , N ⊤ ∩ D) = ({p}, {}) is not a repair of (D, IC), since ({p}, IC) is not a consistent database.
Again, as we have noted above, it is possible to show that the converse of Proposition 3 is also true: Proof: Consider a valuation N , defined for every atom p as follows: By the definition of N and since (Insert, Retract) is a repair of (D, IC), we have that It remains to show that N is a (three-valued) model of D and IC.It is a three-valued model of D because for every p ∈ D, N (p) ∈ {t, ⊤}.Regarding IC, (Insert, Retract) is a repair of (D, IC), thus every formula in IC is true in the least Herbrand model M of D ′ = D ∪ Insert \ Retract.In particular, M (q) = t for every q ∈ D ′ .But since for every p ∈ D∪Insert we have that N (p) ∈ {t, ⊤} and D ′ ⊆ D∪Insert, necessarily ∀q ∈ D ′ N (q) ∈ {t, ⊤}.It follows that for every q ∈ D ′ , N (q) ≥ k M (q) = t, thus by Lemma 1 and Lemma 2, N must also be a (three-valued) model of D ′ .Hence N is a model of IC. 2 The last two propositions characterize the repairs of U DB in terms of pairs that are associated with certain three-valued models of D ∪ IC.We shall denote the elements of these pairs as follows: Notation 2 Let N be a three-valued model and let DB = (D, IC) be a database.Denote: We conclude this model-based analysis by characterizing the set of the ≤-preferred repairs, where ≤ is one of the preference criteria considered in Definition 7 (i.e., set inclusion or minimal cardinality).As the propositions below show, common considerations on how inconsistent databases can be 'properly' recovered (e.g., keeping the amount of changes as minimal as possible, being 'as close as possible' to the original instance, etc.) can be captured by preferential models in the context of preferential semantics (Shoham, 1988).The idea is to define some order relation on the set of the (three-valued) models of the database.This relation intuitively captures some criterion for making preferences among the relevant models.Then, only the 'most preferred' models (those that are minimal with respect to the underlying order relation) are considered in order to determine how the database should be repaired.Below we formalize this idea: Definition 11 Given a database DB = (D, IC), denote: Example 5 Consider again the database DB = ({p, r} , {p → q}) of Example 4. As we have shown, there are six valuations of the form H D ⊕ M , for some two-valued model M of IC, namely: {p : t , q : ⊤, r : t}, {p : t , q : ⊤, r : ⊤}, {p : ⊤, q : ⊤, r : t}, {p : ⊤, q : ⊤, r : ⊤}, {p : ⊤, q : f, r : t}, {p : ⊤, q : f, r : ⊤}.
The k-minimal models among these models are {p : t, q : ⊤, r : t} and {p : ⊤, q : f, r : t}, thus Preference orders should reflect some normality considerations applied on the relevant set of valuations (M DB , in our case); ν is preferable than µ, if ν describes a situation that is more common (plausible) than the one described by µ.Hence, a natural way to define preferences in our case is by minimizing inconsistencies.We thus get the following definition: Definition 12 Let S be a set of three-valued valuations, and N 1 , N 2 ∈ S.
16.Note that N is a three-valued valuation and M is a two-valued model of IC.
The following propositions show that there is a close relationship between most consistent models of M DB and the preferred repairs of DB.
and Insert∪Retract ⊂ Insert N ∪Retract N .By Proposition 4 and its proof, there is an element and so N is not a maximally consistent in M DB , but this is a contradiction to the definition of N . 2 Proof: The pair (Insert, Retract) is in particular a repair of DB, thus by Proposition 2 there is a classical model M of IC such that Insert = M t \ D and Retract = D \ M t .Consider the following valuation: otherwise.
First we show that N = H D ⊕ M .This is so since if 17 .Thus N = H D ⊕ M , and so N ∈ M DB .Now, by Proposition 2 again, and by the definition of N , Retract), and so (Insert, Retract) is not a ≤ i -preferred repair of (D, IC), a contradiction. 2 Propositions 5 and 6 may be formulated in terms of ≤ c as follows: 17.Here we use the fact that t ⊕ f = ⊤.
The proofs of the last two propositions are similar to those of Propositions 5 and 6, respectively.
Example 6 Consider again Example 3. We have that: Thus, H D = {p(a) : t, p(b) : t, p(c) : f, q(a) : t, q(b) : f, q(c) : t}, and the classical models of IC are those in which either p(y) is false or q(y) is true for every y ∈ {a, b, c}.Now, since in H D neither p(b) is false nor q(b) is true, it follows that every element in M U DB must assign ⊤ either to p(b) or to q(b).Hence, the ≤ i -maximally consistent elements in M U DB (which in this case are also the ≤ c -maximally consistent elements in M U DB ) are the following: By Propositions 5 and 6, then, the ≤ i -preferred repairs of U DB (which are also its Example 7 In Examples 4 and 5, the ≤ i -maximally consistent elements (and the ≤ cmaximally consistent elements) of M DB are N 1 = {p : t, q : ⊤, r : t} and N 2 = {p : ⊤, q : f, r : t}.
It follows that the preferred repairs in this case are ({q}, ∅) and (∅, {p}).
To summarize, in this section we have considered a model-based, three-valued preferential semantics for database integration.We have shown (Propositions 5 -8) that common and natural criteria for making preferences among possible repairs (i.e., set inclusion and minimal cardinality) can be expressed by order relations on three-valued models of the database.The two ways of making preferences (among repairs on one hand and among three-valued models on the other hand) are thus strongly related, and induce two alternative approaches for database integration.In the next section we shall consider a third approach to the same problem (aimed to provide an operational semantics for database integration) and relate it to the model-based semantics, discussed above.

Computing Repairs through Abduction
In this section we introduce an abductive system that consistently integrates possibly contradicting data-sources.This system computes, for a set of data-sources and a preference criterion ≤, the corresponding ≤-repaired databases18 .Our framework is composed of an abductive logic program (Denecker & Kakas, 2000) and an abductive solver Asystem (Kakas, Van Nuffelen, & Denecker, 2001;Van Nuffelen & Kakas, 2001) that is based on the abductive refutation procedure SLDNFA (Denecker & De Schreye, 1992, 1998).In the first three parts of this section we describe these components: in Section 4.1 we give a general description of abductive reasoning, in Section 4.2 we show how it can be applied to encode database repairs, and in Section 4.3 we describe the 'computational platform'.Then, in Section 4.4 we demonstrate the computation process by a comprehensive example, and in Section 4.5 we specify soundness and completeness results of our approach (with respect to the basic definitions of Section 2 and the model-based semantics of Section 3).Finally, in Section 4.6 we consider some ways of representing special types of data in the system.

Abductive Logic Programming
We start with a general description of abductive reasoning in the context of logic programming.As usual in logic programming, the language contains constants, functions, and predicate symbols.A term is either a variable, a constant, or a compound term f (t 1 , . . ., t n ), where f is an n-ary function symbol and t i are terms.An atom is an expression of the form p(t 1 , . . ., t m ), where p is an m-ary predicate symbol and t i (i = 1,. .., m) are terms.A literal is an atom or a negated atom.A denial is an expression of the form ∀X(← F ), where F is a conjunction of literals and X is a subset of the variables in F .The free variables in F (those that are not in X) can be considered as place holders for objects of unspecified identity (Skolem constants).Intuitively, the body F of a denial ∀X(← F ) represents an invalid situation.
Definition 13 (Kakas et al., 1992;Denecker & Kakas, 2000) An abductive logic theory is a triple T = (P , A , IC), where: • P is a logic program, consisting of clauses of the form h ← l 1 ∧ . . .∧ l n , where h is an atomic formula and l i (i = 1, . . ., n) are literals.These clauses are interpreted as definitions for the predicates in their heads, • A is a set of abducible predicates, i.e., predicates that do not appear in the head of any clause in P, • IC is a set of first-order formulae, called integrity constraints.
All the main model semantics of logic programming can be extended to abductive logic programming.This includes two-valued completion (Console, Theseider Dupre, & Torasso, 1991) and three-valued completion semantics (Denecker & De Schreye, 1993), extended well-founded semantics (Pereira, Aparicio, & Alferes, 1991), and generalized stable semantics (Kakas & Mancarella, 1990b).These semantics can be defined in terms of arbitrary interpretations (Denecker & De Schreye, 1993), but generally they are based on Herbrand interpretations.The effect of this restriction on the semantics of the abductive theory is that a domain closure condition is imposed: the domain of interpretation is known to be the Herbrand universe.A model of an abductive theory under any of these semantics is a Herbrand interpretation H, for which there exists a collection of ground abducible facts ∆, such that H is a model of the logic program P ∪ ∆ (with respect to the corresponding semantics of logic programming) and H classically satisfies any element in IC.
Similarly, for any of the main semantics S of logic programming, one can define the notion of an abductive solution for a query and an abductive theory.
Definition 14 (Kakas et al., 1992;Denecker & Kakas, 2000) An (abductive) solution for a theory (P , A , IC) and a query Q is a set ∆ of ground abducible atoms, each one having a predicate symbol in A, together with an answer substitution θ, such that the following three conditions are satisfied: In the next section we will use an abductive theory with a non-recursive program to model the database repairs.The next proposition shows that for such abductive theories all Herbrand semantics coincide, and models correspond to abductive solutions for the query true.
Proposition 9 Let T = (P , A , IC) be an abductive theory, such that P is a non-recursive program.Then H is a Herbrand model of the three-valued completion semantics, iff H is a Herbrand model of the two-valued completion semantics, iff H is a generalized stable model of T , iff H is a generalized well-founded model of T .
If H is a model of T , then the set ∆ of abducible atoms in H is an abductive solution for the query true.Conversely, for every abductive solution for true, there exists a unique model H of T , such that ∆ is the set of true abducible atoms in H.

Proof:
The proof is based on the well-known fact that for non-recursive logic programs, all the main semantics of logic programming coincide.In particular, for a non-recursive logic program P there is a Herbrand interpretation H, which is the unique model under each semantics (see, for example, Denecker & De Schreye, 1993).
Let H be a model of T = (P , A , IC) under any of the four semantics mentioned above.Then there exists a collection of ground abducible facts ∆, such that H is a model of the logic program P ∪ ∆ under the corresponding semantics of logic programming.Since P is non-recursive, so is P ∪ ∆.By the above observation, H is the unique model of P ∪ ∆ under any of the above mentioned semantics.Hence, H is a model of T under any of the other semantics.This proves the first part of the proposition.
When H is a Herbrand model of T , there is a set ∆ of abducible atoms such that H is a model of P ∪ ∆.Clearly, ∆ must be the set of true abducible atoms in H. Then P ∪ ∆ is obviously consistent, and it entails the integrity constraints of T , which entails true.Hence, ∆ is an abductive solution for true.Conversely, for any set ∆ of abducible atoms, P ∪ ∆ has a unique model H and the set of true abducible atoms in H is ∆.When ∆ is an abductive solution for true, H satisfies the integrity constraints, and hence H is a model of T .Consequently, H is the unique model of T , and its set of true abducible atoms is ∆. 2 In addition to the standard properties of abductive solutions for a theory T and a query Q, specified in Definition 14, one frequently imposes optimization conditions on the solutions ∆, analogous to those found in the context of database repairs.Two frequently used criteria are that the generated abductive solution ∆ should be minimal with respect to set inclusion or with respect to set cardinality (cf.Definition 7).The fact that the same preference criteria are used for choosing appropriate abductive solutions and for selecting preferred database repairs does not necessarily mean that there is a natural mapping between the corresponding solutions.In the next sections we will show, however, that meta-programming allows us to map a database repair problem into an abductive problem (w.r.t. the same type of preference criterion).

An Abductive Meta-Program for Encoding Database Repairs
The task of repairing the union of n given databases DB i with respect to the integration of the local integrity constraints IC, can be represented by an abductive theory T = (P , A , IC ′ ), where P is a meta-program encoding how a new database is obtained by updating the existing databases, A is the set {insert, retract} of abducible predicates used to describe updates, and IC ′ encodes the integrity constraints.In P, facts p that appear in at least one of the databases are encoded by atomic rules db(p), and facts p that appear in the updated database are represented by atoms fact(p).The latter predicate is defined as follows: To assure that the predicates insert and retract encode a proper update of the database, the following integrity constraints are also specified: • An inserted element should not belong to a given database: ← insert(X) ∧ db(X) • A retracted element should belong to some database: The set of integrity constraints IC ′ is obtained by a straightforward transformation from IC: every occurrence of a database fact p in some integrity constraint is replaced by fact(p) 19 .
Example 8 (Example 1, revisited) Figure 2 contains the meta-program encoding Example 1 (the codes for Examples 2 and 3 are similar).
As noted in Section 4.1, under any of the main semantics of abductive logic programing there is a one to one correspondence between repairs of the composed database DB and the Herbrand models of its encoding, the abductive meta theory T .Consequently, abduction can be used to compute repairs.In the following sections we introduce an abductive method for this purpose.
19.Since our abductive system (see Section 4.3) will accept integrity constraints in a denial form, in case that the elements of IC ′ are not in this form, the Lloyd-Topor transformation (Lloyd & Topor, 1984) may also be applied here; we consider this case in Section 4.3.2.
% System definitions: defined(fact( )) defined(db( )) abducible(insert( )) abducible(retract( )) % The composer: % The databases: Below we describe the abductive system that will be used to compute database repairs.The Asystem (Kakas, Van Nuffelen, & Denecker, 2001;Van Nuffelen & Kakas, 2001) is a tool combining abductive logic theories and constraint logic programming (CLP).It is a synthesis of the refutation procedures SLDNFA (Denecker & De Schreye, 1998) and ACLP (Kakas et al., 2000), together with an improved control strategy.The essence of the Asystem is a reduction of a high level specification to a lower level constraint store, which is managed by a constraint solver.See http://www.cs.kuleuven.ac.be/∼dtai/kt/ for the latest version of the system20 .Below we review the theoretical background as well as some practical considerations behind this system.For more information, see (Denecker & De Schreye, 1998) and (Kakas, Van Nuffelen, & Denecker, 2001).

Abductive Inference
The input to the Asystem is an abductive theory T = (P, A, IC), where IC consists of universally quantified denials.The process of answering a query Q, given by a conjunction of literals, can be described as a derivation for Q through rewriting states.A state is a pair (G , ST ), where G, the set of goal formulae, is a set of conjunctions of literals and denials.
During the rewriting process the elements in G (the goals) are reduced to basic formulae that are stored in the structure ST .This structure is called a store, and it consists of the following elements 21 : • a set ∆ that contains abducibles a(t).
• a set ∆ * that contains denials of the form ∀X(← a(t) ∧ Q), where a(t) is an abducible.Such a denial may contain free variables.
• a set E of equalities and inequalities over terms.
The consistency of E is maintained by a constraint solver that uses the Martelli and Montanari unification algorithm (Martelli & Montanari, 1982) for the equalities and constructive negation for the inequalities.
A state S = (G , ST ) is called consistent if G does not contain false and ST is consistent (since ∆ and ∆ * are kept consistent with each other and with E, the latter condition is equivalent to the consistency of E).A consistent state with an empty set of goals (G = ∅) is called a solution state.
A derivation starts with an initial state (G 0 , ST 0 ), where every element in ST 0 is empty, and the initial goal, G 0 , contains the query Q and all the integrity constraints IC of the theory T .Then a sequence of rewriting steps is performed.A step starts in a certain state S i = (G i , ST i ), selects a goal in G i , and applies an inference rule (see below) to obtain a new consistent state S i+1 .When no consistent state can be reached from S i the derivation backtracks.A derivation terminates when a solution state is reached, otherwise it fails (see Section 4.4 below for a demonstration of this process).
Next we present the inference rules in the Asystem, using the following conventions: where F is the selected goal formula.• OR and SELECT denote nondeterministic choices in an inference rule.
• Q is a conjunction of literals, possibly empty.Since an empty conjunction is equivalent to true, the denial ← Q with empty Q is equivalent to false.
• If ∆, ∆ * , and E are not mentioned, they remain unchanged.
The inference rules are classified in four groups, named after the leftmost literal in the selected formula (shown in bold).Each group contains rules for (positive) conjunctions of literals and rules for denials.

Defined predicates:
The inference rules unfold the bodies of a defined predicate.For positive conjunctions this corresponds to standard resolution with a selected clause, whereas in the denial all clauses are used because every clause leads to a new denial.
D.1 p(t) ∧ Q: Let p(s i ) ← B i ∈ P (i = 1, . . ., n) be n clauses with p in the head.Then: The actual implementation of the Asystem also contains a store for finite domain constraint expressions.This store is not needed for the application here, and hence it is omitted.
Resolving negation corresponds to 'switching' the mode of reasoning from a positive literal to a denial and vice versa.This is similar to the idea of negation-as-failure in logic programming.
N.1 ¬p(t) ∧ Q: The first rule is responsible for the creation of new hypotheses.Both rules ensure that the elements in ∆ are consistent with those in ∆ * .
A.1 a(t) ∧ Q: SELECT an arbitrary a(s) ∈ ∆ and define These inference rules isolate the (in)equalities, so that the constraint solver can evaluate them.The first rule applies to equalities in goal formulae: The following three rules handle equalities in denials.Which rule applies depends on whether s or t contain free or universally quantified variables.In these rules Q[X/t] denotes the formula that is obtained from Q by substituting the term t for X.
If s and t are not unifiable then G i+1 = G − i ; Otherwise, let E s be the equation set in solved form representing a most general unifier of s and t (Martelli & Montanari, 1982).
where X is a free variable and X is the set of universally quantified variables in a term t: As usual, one has to check for floundering negation.This occurs when the inference rule N.2 is applied on a denial with universally quantified variables in the negative literal ¬p(t).Floundering aborts the derivation.
An answer substitution θ, derived from a solution state S, is any substitution θ of the free variables in S which satisfies E (i.e.θ(E) is true) and grounds ∆.Note that, in case of an abductive theory without abducibles and integrity constraints, computed answers as defined by Lloyd (1987) are most general unifiers of E and correct answers are answer substitutions as defined above.
Proposition 10 (Kakas, Van Nuffelen, & Denecker, 2001) Let T = (P, A, IC) be an abductive theory, Q a query, S a solution state of a derivation for Q, and θ an answer substitution of S. Then the pair consisting of the ground abducible atoms θ(∆(S)) and of the answer substitution θ is an abductive solution for T and Q.

Constraint Transformation to Denial Form
Since the inference rules of the Asystem are applied only on integrity constraints in denial form, the integrity constraints IC in the abductive theory T must be translated to this form.This is done by applying a variant of the Lloyd-Topor transformation (Lloyd & Topor, 1984) on the integrity constraints (see Denecker & De Schreye, 1998).This is the same procedure as the well-known procedure used in deductive databases to convert a first order quantified query Q into a logically equivalent pair of an atomic query and a nonrecursive datalog procedure.The transformation is defined as a rewriting process of sets of formulae: the initial set is {← ¬F |F ∈ IC}, and the transformation is done by applying De Morgan and various distribution rules.New predicates and rules may be introduced during the transformation in order to deal with universal quantifications in denials.Below we illustrate the transformation in the case of the integrity constraints of the running example.
Example 9 Consider the following extension of the integrity constraints of Example 1: Note that in addition to the original integrity constraint of Example 1, here we also demand that every teacher has to give at least one course.

Control Strategy
The selection strategy applied during the derivation process is crucial.A Prolog-like selection strategy (left first, depth first) often leads to trashing, because it is blind to other choices, and it does not result in a global overview of the current state of the computation.
In the development of the Asystem the main focus was on the improvement of the control strategy.The idea is to apply first those rules that result in a deterministic change of the state, so that information is propagated.If none of such rules is applicable, then one of the left over choices is selected.By this strategy, commitment to a choice is suspended until the moment where no other information can be derived in a deterministic way.This resembles a CLP-solver, in which the constraints propagate their information as soon as a choice is made.This propagation can reduce the number of choices to be made and thus often dramatically increases the performance.

Implementation
In this section we describe the structure of our implementation.Figure 3 shows a layered view.The upper-most level consists of the specific abductive logic theory of the integration task, i.e., the database information and the integrity constraints.This layer together with the composer form the abductive meta-theory (see Section 4.2) that is processed by the Asystem.As noted above, the composer consists of a meta-theory for integrating the databases in a coherent way.It is interpreted here as an abductive theory, in which the abducible predicates provide the information on how to restore the consistency of the amalgamated data.

Composer
The abductive system (enclosed by dotted lines in Figure 3) consists of three main components: a finite domain constraint solver (part of Sicstus Prolog), an abductive metainterpreter (described in the previous sections), and an optimizer.
The optimizer is a component that, given a preference criterion on the space of the solutions, computes only the most-preferred (abductive) solutions.Given such a preference criterion, this component prunes 'on the fly' those branches of the search tree that lead to solutions that are worse than others that have already been computed.This is actually a branch and bound 'filter' on the solutions space, that speeds-up execution and makes sure that only the desired solutions will be obtained23 .If the preference criterion is a pre-order, then the optimizer is complete, that is, it can compute all the optimal solutions (more about this in Section 4.5).Moreover, this is a general-purpose component, and it may be useful not only for data integration, but also for, e.g., solving planning problems.

Complexity
It is well-known that in general, the task of repairing a database is not tractable, as there may be an exponential number of different ways of repairing it.Even in cases where integrity constraints are assumed to be single-headed dependencies (Greco & Zumpano, 2000), checking whether there exists a ≤-repaired database in which a certain query Q is satisfied, is in Σ P 2 .Checking if a fact is satisfied by all the ≤-repaired databases is in Π P 2 (see Greco & Zumpano, 2000).This is not surprising in light of the correspondence between computations of ≤-minimal repairs and computations of entailment relations defined by maximally consistent models (see Propositions 5-8), also known to be on the second level of the polynomial hierarchy.
A pure upper bound for the Asystem is still unknown, since -to the best of our knowledge -no complexity results on SLDNFA refutation procedure are available.

Example: A Derivation of Repairs by the Asystem
Consider again Example 9.The corresponding meta-theory (assuming that the Lloyd-Topor transformation has been applied on it) is given in Figure 4.In this case, and in what follows, we shall assume that all variables in the denials are universally quantified, and so, in order to reduce the amount of notations, universal quantifiers are omitted from the denial rules.
We have executed the code of Figure 4, as well as other examples from the literature in our system.As Theorem 2 in Section 4.5 guarantees, the output in each case is the set of the most preferred solutions of the corresponding problem.In what follows we demonstrate db(teacher(n 1 )) db(teacher(n 2 )) db(teacher(n 3 )) db(teaches(c 1 ,n 1 )) db(teaches(c 2 ,n 2 )) db(teaches(c 2 ,n 3 )) how some of the most preferred solutions for the meta-theory above are computed.
We follow one branch in the refutation tree, starting from the initial state (G 0 , ST 0 ), where the initial set of goals is G 0 = {'true ′ , ic1, ic2, composer-ic1, composer-ic2}, and the initial store is ST 0 = (∅, ∅, ∅).Suppose that the first selected formula is Then, by D.2, and ST 1 = ST 0 .Now, pick Select db(teaches(X,Y)), unfold all the corresponding atoms in the database, and then, again by D.2, followed by E.2 and E.3, and still ST 2 = ST 1 .Pick then the second denial among the new goals that were added to G 2 .Denote this denial F 3 .Since F 3 starts with a negated literal, N.2 applies, and the derivation process splits here to two branches.The second branch contains and still ST 3 = ST 2 .Choose now the first new goal, i.e., Now, since ∆ 3 = ∅, the only option is to add F 4 to ∆ * 3 .Thus, by A.2, Assume, now, that we take the second new goal of G 3 : Following a similar process of unfolding data as described above, using db(teaches(c 2 , n 3 )), we end-up with Selecting the negative literal (n 2 = n 3 ), N.2 applies again.The first branch quickly results in failure after adding (n 2 = n 3 ) to E. The second branch adds ← (n 2 = n 3 ) and retract(teaches(c 2 , n 3 )) to the set of goals.The former one is added to the constraint store, as (n 2 = n 3 ), and simplifies to true.Assume the latter is selected next.Let this be the i-th step.We have that by now ∆ i−1 (the set of abducible predicates produced until the current step) is empty, thus the only option is to abduce retract(teaches(c 2 , n 3 )).Thus, by A.1, ST i consists of: As the last goal is certainly satisfied, ic1 is resolved in this branch.Now we turn to ic2.So: The evaluation of F i+1 for either x = n 1 or x = n 2 is successful, so the only interesting case is when x = n 3 .In this case the evaluation leads to the goal gives courses(X).Unfolding this goal yields that fact(teaches(Y, n 3 )) appears in the goal set.In order to satisfy this goal, it should be resolved either with one of the composer's rules (using D.1).The first rule (i.e., fact(X) ← db(X) ∧¬ retract(X)) leads to a failure (since retract(teaches(c 2 , n 3 )) is already in ∆), and so the second rule of the composer, fact(X) ← insert(X), must be applied.This leads to the abduction of insert(teaches(Y, n 3 )).By ic1, Y = c 1 and Y = c 2 is derived 24 .Also, composer-ic1 and composer-ic2 are satisfied by the current state, so eventually the solution state that is reached from the derivation path described here, contains the following sets: which means retraction of teaches(c 2 , n 3 ) and insertion of teaches(Y, n 3 ) for some Y = c 1 and Y = c 2 .The other solutions are obtained in a similar way.
Note 2 Below are some remarks on the above derivation process.
1.The solution above contains a non-ground abducible predicate.This indeed is the expected result, since this solution resolves the contradiction with the integrity constraint ic1 by removing the assumption that teacher n 3 teaches course c 2 .As a result, teacher n 3 does not teach any course.Thus, in order to assure the other integrity constraint (ic2), the solution indicates that n 3 must teach some course (other than c 1 and c 2 ).
2. One possible (and realistic) explanation for the cause of the inconsistency in the database of Example 9 and Figure 4, is a typographic error.It might happen, for instance, that c 2 was mistakenly typed instead of, say, c 3 , in teaches(c 2 ,n 3 ).In this case, the database repair computed above pinpoints this possibility (in our case, then, Y should be equal to c 3 )25 .This explanation cannot be explicitly captured, unless particular repairs with non-ground solutions are constructed, as indeed is the case here.While some other approaches that have been recently introduced (e.g., Bravo & Bertossi, 2003;Cali, Lembo, & Rosati, 2003) properly capture cases such as those of Example 9, to the best of our knowledge, no other application of database integration has this ability.
3. Once the system finds a solution that corresponds to a goal state S g = (G g , ST g ) with G g = ∅ and ST g = (∆ g , ∆ * g , E g ), the ≤ i -optimizer may be used such that whenever a state S = (G s , (∆ s , ∆ * s , E s )) is reached, and |∆ g | < |∆ s |, the corresponding branch of the tree is pruned26 .

Soundness and Completeness
In this section we give some soundness and completeness results for the Asystem, and relate these results to the model-based preferential semantics, considered in Section 3.
In what follows we denote by T an abductive meta-theory (constructed as described in Section 4.2) for composing n given databases DB 1 , . . ., DB n .Let also Proc ALP be some sound abductive proof procedure for T27 .The following proposition shows that Proc ALP provides a coherent method for integrating the databases that are represented by T .
Proposition 11 Every abductive solution that is obtained by Proc ALP for the query 'true' on a theory T , is a repair of U DB.
Proof: By the construction of T it is easy to see that all the conditions that are listed in Definition 5 are satisfied.Indeed, the first two conditions are assured by the integrity constraints of the composer.The last condition is also met since by the soundness of Proc ALP it produces abductive solutions ∆ i for a query 'true' on T .Thus, by the second property in Definition 14, for every such solution ∆ i = (Insert i , Retract i ) we have that P ∪ ∆ i |= IC.Since P contains a data section with all the facts, it follows that D ∪ ∆ i |= IC, i.e. every integrity constraints follows from D ∪ Insert i \ Retract i . 2 As SLDNFA is a sound abductive proof procedure (Denecker & De Schreye, 1998), it can be taken as the procedure Proc ALP , and so Proposition 11 provides a soundness theorem for the current implementation of the Asystem.When an optimizer is incorporated in the Asystem, we have the following soundness result for the extended system: Theorem 1 (Soundness) Every output that is obtained by the query 'true' on T , where the Asystem is executed with a ≤ c -optimizer [respectively, with an ≤ i -optimizer], is a ≤ cpreferred repair [respectively, an ≤ i -preferred repair] of U DB.
Proof: Follows from Proposition 11 (since the Asystem is based on SLDNFA which is a sound abductive proof procedure), and the fact that the ≤ c -optimizer prunes paths that lead to solutions that are not ≤ c -preferable.Similar arguments hold for systems with an ≤ i -optimizer. 2 Proposition 12 Suppose that the query 'true' has a finite SLDNFA-tree w.r.t.T .Then every ≤ c -preferred repair and every ≤ i -preferred repair of U DB is obtained by running T in the Asystem.

Outline of proof:
The proof that all the abductive solutions with minimal cardinality are obtained by the system is based on Theorem 10.1 of Denecker & De Schreye, 1998, where it is shown that SLDNFA o , which is an extension of SLDNFA, aimed for computing solutions with minimal cardinality, is complete; see Denecker & De Schreye, 1998, Section 10.1, for further details.Similarly, the proof that all the abductive solutions which are minimal w.r.t.set inclusion are obtained by the system is based on Theorem 10.2 of Denecker & De Schreye, 1998, that shows that SLDNFA + , which is another extension of SLDNFA, aimed for computing minimal solutions w.r.t.set inclusion, is also complete; see Denecker & De Schreye, 1998, Section 10.2, for further details.Now, the Asystem is based on the combination of SLDNFA o and SLDNFA + .Moreover, as this system does not change the refutation tree (but only controls the way rules are selected), Theorems 10.1 and 10.2 in Denecker and De Schreye (1998) are applicable in our case as well.Thus, all the ≤ c -and the ≤ i -minimal solutions are produced.This in particular means that every ≤ c -preferred repair as well as every ≤ i -preferred repair of U DB is produced by our system.2 It should be noted that the last proposition does not guarantee that non-preferred repairs will not be produced (as this is not true in general).However, as the following theorem shows, the use of an optimizer excludes this possibility.
Theorem 2 (Completeness) In the notations of Proposition 12 and under its assumptions, the output of the execution of T in the Asystem together with a ≤ c -optimizer [respectively, together with an ≤ i -optimizer] is exactly !(U DB, ≤ c ) [respectively, !(U DB, ≤ i )].
Proof: We shall show the claim for the case of ≤ c ; the proof w.r.t.≤ i is similar.
Let (Insert, Retract) ∈ !(U DB, ≤ c ).By Proposition 12, ∆ = (Insert, Retract) is one of the solutions produced by the Asystem for T .Now, during the execution of the system together with the ≤ c -optimizer, the path that corresponds to ∆ cannot be pruned from the refutation tree, since by our assumption (Insert, Retract) has a minimal cardinality among the possible solutions, so the pruning condition is not satisfied.Thus ∆ will be produced by the ≤ c -optimized system.For the converse, suppose that (Insert, Retract) is some repair of U DB that is produced by the ≤ c -optimized system.Suppose for a contradiction that (Insert, Retract) ∈ !(U DB, ≤ c ).By the proof of Proposition 12, there is some ∆ ′ = (Insert ′ , Retract ′ ) ∈ !(U DB, ≤ c ) that is constructed by the Asystem for T , and (Insert ′ , Retract ′ ) < c (Insert, Retract).But |∆ ′ | < |∆|, and so the ≤ c -optimizer would prune the path of the ∆ solution once its cardinality becomes bigger than |∆ ′ |.This contradicts our assumption that (Insert, Retract) is produced by the ≤ c -optimized system. 2 Note 3 The SLDNFA-resolution on which the Asystem is based is an extension of SLDNFresolution (Lloyd, 1987) and coincides with it for logic programs with empty sets of abducible predicates.SLDNF-resolution is complete only if its computation always terminates.SLDNFA inherits this property.This is the reason why the condition of a finite SLDNFA-tree is imposed in Proposition 12 and Theorem 2. Like SLDNF, the termination of SLDNFA can be guaranteed by imposing syntactic conditions on the program.We refer to (Verbaeten, 1999), where some conditions are proposed to guarantee the existence of a finite SLDNFA-tree.
In the context of our paper, floundering would arise in the presence of unsafe integrity constraints (e.g., ∀x p(x)).One way to eliminate this problem is to use a unary domain predicate dom, ranging over the objects of the database, and to add a range for each quantified variable in the integrity constraints, so that we obtain formulae of the form ∀x(dom(x) → ψ(x)) and ∃x(dom(x) ∧ ψ(x)).
The following results immediately follow from the propositions above and those of Section 3 (unless explicitly said, the Asystem is without optimizer).
Corollary 1 Suppose that the query 'true' has a finite SLDNFA refutation tree w.r.t.input theory T .Then: Corollary 2 In the notations of Corollary 1 and under its assumption, we have that: 1. for every output (Insert, Retract) that is obtained by running the Asystem together with an ≤ i -optimizer [respectively, together with a ≤ c -optimizer], there is an ≤ i -maximally consistent element [respectively, a ≤ c -maximally consistent element] N in M U DB s.t.
Insert N = Insert and Retract N = Retract.
2. for every ≤ i -maximally consistent element [respectively, ≤ c -maximally consistent element] N in M U DB there is a solution (Insert, Retract) that is obtained by running the Asystem together with an ≤ i -optimizer [respectively, together with a ≤ c -optimizer] s.t.Insert = Insert N and Retract = Retract N .
The last corollaries show that the operational semantics, induced by the Asystem, can also be represented by a preferential semantics, in terms of preferred models of the theory.The set R(U DB, ≤) that represents the intended meaning of how to '≤-recover' the database U DB, can therefore be obtained computationally, by the set {(Insert, Retract) | (Insert, Retract) is an output of the Asystem with an ≤-optimizer}, or, equivalently, it can be described in terms of preferred models of the theory, by the following set:

Handling Specialized Information
The purpose of this section is to demonstrate the potential usage of our system in more complex scenarios, where various kinds of specialized data are incorporated in the system.In particular, we briefly consider time information and source identification.We also give some guidelines on how to extend the system with capabilities of handling these kinds of information.

Timestamped Information
Many database applications contain temporal information.This kind of data may be divided in two types: time information that is part of the data itself, and time information that is related to database operations (e.g., records on database update time).Consider, for instance, birth day (John,15/05/2001) 16/05/2001 .Here, John's date of birth is an instance of the former type of time information, and the subscripted data that describes the time in which this fact was added to the database, is an instance of the latter type of time information.
In our approach, timestamp information can be integrated by adding a temporal theory describing the state of the database at any particular time point.One way of doing so is by using situation calculus.In this approach a database is described by some initial information and a history of events performed during the database lifetime (see Reiter, 1995).Here we use a different approach, which is based on event calculus (Kowalski & Sergot, 1986).The idea is to make a distinction between two kinds of events, add db and del db, that describe the database modifications, and the composer-driven events insert and retract that are used for constructing database repairs.In this view, the extended composer has the following form: holds at(P,T) ← initially(P) ∧ ¬clipped(0,P,T) holds at(P,T) ← add(P,E) ∧ E<T ∧ ¬clipped(E,P,T) clipped(E,P,T) ← del(P,C) ∧ E≤C, C<T add(P,T) ← add db(P,T) add(P,T) ← insert(P,T) del(P,T) ← del db(P,T) del(P,T) ← retract(P,T) ← insert(P,T) ∧ retract(P,T) ← insert(P,T) ∧ add db(P,T) ← retract(P,T) ∧ del db(P,T) Note that in the above extended representation, the integrity constraints must be carefully specified.Consider, e.g. the statement that a person can be born only on one date: ← holds at(birth day(P,D1),T) ∧ holds at(birth day(P,D2),T) ∧ D1 =D2 The problem here is that to ensure consistency, this constraint must be checked at every point in time.This may be avoided by a simple rewriting that ensures that the constraint will be verified only when an event for that person occurs: ic(P,T) ← holds at(birth day(P,D1),T) ∧ holds at(birth day(P,D2),T) ∧ D1 =D2 ← add db(birth day(P, ),T) ∧ NT = T+1 ∧ ic(P,NT) ← insert(birth day(P, ),T) ∧ NT = T+1 ∧ ic(P,NT) ← ic(P,0) Note 4 In the last example we have used temporal integrity constraints in order to resolve contradicting update events.Clearly, contradicting events do not necessarily yield a classically inconsistent database, and so the role of such integrity constraints is to express possible events in terms of time and causation, and -if necessary -describe their consequence as a violation of consistency.
Instead of using temporal integrity constraints and event calculus, one could repair a database with time-stamps by using some time-based criterion for making preferences among its repairs.For instance, denote by db(x 1 , . . ., x n ) t that the data-fact db(x 1 , . . ., x n ) has a timestamp t, and suppose that (Insert, Retract) and (Insert ′ , Retract ′ ) are two repairs of a database (D, IC).A time-based criterion for preferring (Insert, Retract) over (Insert ′ , Retract ′ ) could state, e.g., that for every data-fact db(x 1 , . . ., x n ) and timestamps t 1 , t 2 s.t.db(x 1 , . . ., x n ) t 1 follows from D ∪ Insert \ Retract and db(x 1 , . . ., x n ) t 2 follows from D ∪ Insert ′ \ Retract ′ , necessarily t 1 ≥ t 2 .A more detailed treatment of this issue is outside the scope of this paper.
The interested reader may refer, e.g., to (Sripada, 1995;Mareco & Bertossi, 1999) for a detailed discussion on the use of logic programming based approaches to the specification of temporal databases.Such specifications can be easily combined with those for repairs, given above.

Keeping Track of Source Identities
There are cases in which it is important to preserve the identity of the database from which a specific piece of information was originated.This is useful, for instance, when one wants to make preferences among different sources, or when some specific source should be filtered out (e.g, when the corresponding database is not available or becomes unreliable).This kind of information may be decoded by adding another argument to every fact, which denotes the identity of its origin.This requires minor modifications in the basic composer, since the composer controls the way in which the data is integrated.As such, it is the only component that can keep track on the source of the information.
Suppose, then, that for every database fact we add another argument that identifies its source.I.e., db(X,S) denotes that X is a fact originated from a database S. The composer then has the following form: Note that the composer considers itself as an extra source that inserts brand new data facts.Now it is possible, e.g., to trace information that comes from a specific source, make preferences among different sources (by specifying appropriate integrity constraints), and filter data that comes from certain sources.The last property is demonstrated by the next rule: validFact(X) ← fact(X,S) ∧ trusted source(S) where trusted source enumerates all reliable sources of the data.Note that the last example of 'source identification' can be further extended in order to make preferences among different sources (and not only ignoring some unreliable sources).By introducing a new predicate, trust(Source,Amount), that attaches a certain level of reliability to each source, it is possible, in case of conflicts, to prefer sources with higher reliability as follows: ← fact(X,S) ∧ db(X,S 0 ) ∧ S = S 0 ∧ more trusted(S 0 ,S) more trusted(S 0 ,S) ← trust(S 0 ,A 0 ) ∧ trust(S,A) ∧ A 0 > A This method is particularly useful when the integrity constraint above acts as a functional dependency on specific facts.The following example (originally introduced in Subrahmanian, 1994) demonstrates this.
Example 10 Consider the following simple scenario of 'target recognition', where three sensors of an autonomous vehicle, which have different degrees of reliability, should identify objects in the vehicle's neighborhood: trust(radar,10) trust(gunchar,8) trust(speedometer,5) db(observe(object1,t72),radar) db(observe(object1,t60),gunchar) db(observe(object1,t80),speedometer) As the radar has the highest reliability, its observation will be preserved.The observations of the other sensors will be retracted from the database.

Discussion and an Overview of Related Works
The interest in systems for coherent integration of databases has been continuously growing in the last few years (see, e.g, Olivé, 1991;Baral et al., 1991Baral et al., , 1992;;Revesz, 1993;Subrahmanian, 1994;Bry, 1997;Gertz & Lipeck, 1997;Messing, 1997;Lin & Mendelzon, 1998;Liberatore & Schaerf, 2000;Ullman, 2000;Greco & Zumpano, 2000;Greco et al., 2001;Franconi et al., 2001;Lenzerini, 2001Lenzerini, , 2002;;Arenas et al., 1999Arenas et al., , 2003;;Bravo & Bertossi, 2003;Cali et al., 2003, and many others).Already in the early works on this subject it became clear that the design of systems for data integration is a complex task, which demands solutions to many questions from different disciplines, such as belief revision, merging and updating, reasoning with inconsistent information, constraint enforcement, query processing and -of course -many aspects of knowledge representation.In this section we shall address some of these issues.
One important aspect of data integration systems is how concepts in the independent (stand-alone) data-sources and those of the unified database are mapped to each other.A proper specification of the relations between the source schemas and the schema of the amalgamated data exempts the potential user from being aware where and how data is arranged in the sources.One approach for this mapping, sometimes called global-centric or global-as-view (Ullman, 2000), requires that the unified schema should be expressed in terms of the local schemas.In this approach, every term in the unified schema is associated with a view (alternatively, a query) over the sources.This approach is taken by most of the systems for data integration, as well as ours.The main advantage of this approach is that it induces a simple query processing strategy that is based on unfolding of the query, and uses the same terminology as that of the databases.This indeed is the case in the abductive derivation process, defined in Section 4.3.1.The other approach, sometimes called sourcecentric or local-as-view (used, e.g., in Bertossi et al., 2002), considers every source as a view over the integrated database, and so the meaning of every source is obtained by concepts of the global database.In particular, the global schema is independent of the distributed ones.This implies, in particular, that an addition of a new source to the system requires only to provide local definitions and not necessarily involves changes in the global schema.The main advantage of the latter approach is, therefore, that it provides a better setting for maintenance.For a detailed discussion on this topic, see (Ullman, 2000;Lenzerini, 2001;Cali et al., 2002;Van Nuffelen et al., 2004).More references and a survey on different approaches to data integration appear in the papers of Batini, Lenzerini, and Navathe (1986), Rahm andBernstein (2001), andLenzerini (2002).
Another major issue that has to be addressed is the ability of data integration systems to properly cope with dynamically evolving worlds.In particular, the domain of discourse should not be fixed in advance, and information may be revised on a regular basis.The last issue is usually handled by methods of belief revision (Alchourrón et al., 1995;Gärdenfors & Rott, 1995) and nonmonotonic reasoning.In the context of belief revision it is common to make a distinction between revisions of integrity constraints and changes in the sets of the data-facts, since the two types of information have different nature and thus may require different approaches for handling dynamic changes.When the set of integrity constraints is given in a clause form, methods of dynamic logic programing (Alferes et al., 2000(Alferes et al., , 2002) ) may be useful for handling revisions.As noted in (Alferes et al., 2002), assuming that each local database is consistent (as in our case), dynamic logic programing (together with a proper language for implementing it, like LUPS (Alferes et al., 2002)) provides a way of avoiding contradictory information, and so this may be viewed as a method of updating a database by a sequence of integrity constraints that arrive at different time points.
When the types of changes are predictable, or can be characterized in some sense, temporal integrity constraints (in the context of temporal databases) can be used in order to specify how to treat new information.This method is also useful when the revision criteria are known in advance (e.g., 'in case of collisions, prefer the more recent data', cf.Section 4.6.1).See, e.g., (Sripada, 1995;Mareco & Bertossi, 1999) for a detailed discussion on temporal integrity constraints and temporal databases in a logic programming based formalisms.
The second type of revisions (i.e., modifications of data-facts) is obtained here through the (preferred) repairs of the unified database, which induce corresponding modifications of data-facts.A repair is usually induced by a method of restoring (or assuring) consistency of the amalgamated database by a minimal amount of change.As in our case, the minimization criterion is often determined by the aspiration to remain 'as close as possible' to the set of the collective information.This is a typical kind of a repair goal, and the standard ways of formally expressing it are by enumeration methods, such as the following28 : • Minimizing the Hamming distance between the (propositional) models of the unified database and its repairs (Liberatore & Schaerf, 2000), or minimizing the distance between the corresponding three-valued interpretations (de Amo et al., 2002) according to a suitable generalization of Hamming distance.
• When the underlying data is prioritized, the corresponding quantitative information is also considered in the computations of distances (see, for instance, the work of Liberatore & Schaerf, 2000).
Various ways of computing (preferred/minimal) repairs are described in the literature, among which are proof-theoretical (deductive) methods (Bertossi & Schwind, 2002;de Amo et al., 2002), abductive methods (Kakas & Mancarella, 1990a;Inoue & Sakama, 1995;Sakama & Inoue, 1999, 2000), and algorithmic approaches that are based on computations of maximal consistent subsets (Baral et al., 1991(Baral et al., , 1992)), or use techniques from model-based diagnosis (Gertz & Lipeck, 1997).A common approach is to view a database as a logic program, and to adopt standard techniques of giving semantics to logic programs in order to compute database repairs.For instance, stable-model semantics on disjunctive logic programs is used for computing repairs in (Greco & Zumpano, 2000;Greco et al., 2001;Franconi et al., 2001;Arenas et al., 2003), and resolution-based procedures for integrating several annotated databases are introduced by Subrahmanian (1990Subrahmanian ( , 1994)).As it follows from Section 4, the application introduced here is also based on an extended resolution strategy, applied on logic programs that may contain negation-as-failure operators and abducible predicates.
As repairing a database means in particular elimination of contradictions, reasoning with inconsistent information has been a major challenge for data integration systems.First, it is important to note in this respect that not every formalism for handling inconsistency is acceptable in the context of databases, even if the underlying criterion for handling inconsistency is the same as one of the repair goals mentioned above.The following example demonstrates such a case: Example 11 (Arenas, Bertossi, & Chomicki, 1999) Consider the following (inconsistent) database: DB = ({p, q}, {¬(p ∧ q)}).In the approach of Lin (1996), for instance, p ∨ q may be inferred as the repaired database, following a strategy of minimal change.However, in this approach none of p, q, and ¬(p ∧ q) holds in the repaired database.In particular (since in (Lin, 1996) there is no distinction between data-facts and integrity constraints), the integrity constraint {¬(p ∧ q)} itself cannot be inferred, which violates the intended meaning of an integrity constraint in databases.
Many techniques for consistency enforcement and repairs of constraint violations have been suggested, among which are methods for resolving contradictions by quantitative considerations, such as 'majority vote' (Lin & Mendelzon, 1998;Konieczny & Pino Pérez, 2002) or qualitative ones (e.g., defining priorities on different sources of information or preferring certain data over another, as in Benferhat, Cayrol, Dubois, Lang, &Prade, 1993, andArieli, 1999).Another common method of handling inconsistent (and incomplete) information is by turning to multi-valued semantics.Three-valued formalisms such as the one considered in Section 3 are used as a semantical basis of paraconsistent methods to construct database repairs (de Amo, Carnielli, & Marcos, 2002) and are useful in general for pinpointing inconsistencies (Priest, 1991).Other approaches use lattice-based semantics to decode within the language itself some meta-information, such as confidence factors, amount of belief for or against a specific assertion, etc.These approaches combine corresponding formalisms of knowledge representation, such as annotated logic programs (Subrahmanian, 1990(Subrahmanian, , 1994;;Arenas et al., 2000) or bilattice-based logics (Fitting, 1991;Arieli & Avron, 1996;Messing, 1997), together with non-classical refutation procedures (Fitting, 1989;Subrahmanian, 1990;Kifer & Lozinskii, 1992) that allow to detect inconsistent parts of a database and maintain them.

Summary and Future Work
In this paper we have developed a formal declarative foundation for rendering coherent data, provided by different databases, and presented an application that implements this approach.Like similar applications (e.g., Subrahmanian, 1994;Bertossi, Arenas, & Ferretti, 1998;Greco & Zumpano, 2000;Liberatore & Schaerf, 2000), our system mediates among the sources of information and also between the reasoner and the underlying data.
Composition of several data-sources is encoded by meta-theories in the form of abductive logic programs, and it is possible to extend these theories by providing meta-information on the data-facts, such as time-stamps and source identities.Moreover, since the reasoning process of the system is based on a pure generalization of classical refutation procedures, no syntactical embedding of first-order formulae into other languages, nor any extension of two-valued semantics, is necessary.
Due the inherent modularity of the system, each component is independent and can be modified to meet different needs.Thus, for instance, the underlying solver may be replaced with any other solver that is capable of dealing with the meta-theory, and any improvement of the optimizer will affect the whole system and its efficiency, regardless the nature of its input.Also, the way of keeping data coherent is encapsulated in the component that integrates the data (i.e., the composer).This implies, in particular, that no input from the reasoner nor any other external policy for making preferences among conflicting sources is compulsory in order to resolve contradictions.
As we have shown, the operational semantics for inconsistent databases, induced by the Asystem, is strongly related to (multi-valued) preferential semantics.As preferential semantics provides the background for many non-monotonic and paraconsistent formalisms (e.g., Shoham, 1988;Priest, 1989Priest, , 1991;;Kifer & Lozinskii, 1992;Arieli & Avron, 1996;Arieli, 1999Arieli, , 2003)), this implies that the Asystem may be useful for reasoning with general uncertain theories (not necessarily in the form of databases).
It is important to note that our composing system inherits the functionality of the underlying solver.The outcome of this is flexibility, modularity, simple interaction with different sources of information, and the ability to reason with any set of first-order formulae of integrity constraints29 .To the best of our knowledge no other application of data integration has this ability.
There are several directions for further exploration.First, as we have already noted, two more phases, which have not been considered here, might be needed for a complete process of data integration: a) translation of difference concepts to a unified ontology, and b) integration of integrity constraints.So far, formalisms for dealing with the first item (e.g., Lenzerini, 2001Lenzerini, , 2002;;Van Nuffelen et al., 2004) mainly focus on the mutual relations between the global schema and the source (local) schemas, in particular how concepts of each ontology map to each other.On the other hand, formalisms for handling the second item concentrate on nonmonotonic reasoning for dynamically evolving (and mutually inconsistent) worlds.A synthesis of the main ideas behind these approaches, and incorporating them in our system, is a major challenge for future work.
Another important issue that deserves attention is the repair of inconsistency in the context of deductive databases with integrity constraints and definitions of predicates, often called view predicates.We refer to (Denecker, 2000) for a sketch on how this may be done.This kind of data may be further combined with (possibly inconsistent) temporal information, (partial) transactions, and (contradictory) update information.
Finally, since different databases may have different information about the same predicates, it is reasonable to use some weakened version of the closed word assumption as part of the integration process (for instance, an assumption that something is false unless it is in the database, or unless some other database has some information about it).

Definition 3 A
formula ψ follows from a database instance D (alternatively, D entails ψ; notation: D |= ψ) if the minimal Herbrand model of D is also a model of ψ.Definition 4 A database DB = (D, IC) is consistent if every formula in IC follows from D (notation: D |= IC).
11. Note, in particular, that in terms of Definition 2, if ν = H D and x = t, we have that ν x = D. Proof: The definitions of Insert and Retract immediately imply that Insert ∩ D = ∅ and Retract ⊆ D. For the last condition in Definition 5, note that D ∪ Insert \ Retract = D ∪ (M t \D)\(D\M t ) = M t .It follows that M is the least Herbrand model of D∪Insert\Retract and it is also a model of IC, therefore D ∪ Insert \ Retract |= IC. 2 Proposition 2 Let (Insert, Retract) be a repair of a database (D, IC).Then there is a two-valued model M of IC such that Insert = M t \ D and Retract = D \ M t .

Figure 1 :
Figure 1: The structure T HREE Intuitively, the elements t and f in T HREE correspond to the usual classical values true and false, while the third element, ⊤, represents inconsistent information (or belief).

Figure 2 :
Figure 2: A meta-program for Example 1

Figure 3 :
Figure3: A schematic view of the system components.
1. for every output (Insert, Retract) of the Asystem there is a classical model M of IC s.t.Insert = M t \ D and Retract = D \ M t .2. for every output (Insert, Retract) of the Asystem there is a 3-valued model N of D ∪IC s.t.Insert N = Insert and Retract N = Retract.
repaired database of DB = (D, IC), if the set dist(D ′ , D) is minimal (w.r.t.set inclusion) among all the sets of the form dist(D ′′ , D), where D ′′ |= IC.Similarly, if |S| denotes the size of S, then DB ′