Developing Approaches for Solving a Telecommunications Feature Subscription Problem

Call control features (e.g., call-divert, voice-mail) are primitive options to which users can subscribe off-line to personalise their service. The configuration of a feature subscription involves choosing and sequencing features from a catalogue and is subject to constraints that prevent undesirable feature interactions at run-time. When the subscription requested by a user is inconsistent, one problem is to find an optimal relaxation, which is a generalisation of the feedback vertex set problem on directed graphs, and thus it is an NP-hard task. We present several constraint programming formulations of the problem. We also present formulations using partial weighted maximum Boolean satisfiability and mixed integer linear programming. We study all these formulations by experimentally comparing them on a variety of randomly generated instances of the feature subscription problem.


Introduction
Information and communication services, from news feeds to internet telephony, are playing an increasing, and potentially disruptive, role in our daily lives.As a result, service providers seek to develop personalisation solutions allowing customers to control and enrich their service.In telephony, for instance, personalisation relies on the provisioning of call control features.A feature is an increment of functionality which, if activated, modifies the basic service behaviour in systematic or non-systematic ways, e.g., do-not-disturb, multi-media ring-back tones, call-divert-on-busy, credit-card-calling.
Modern service delivery platforms provide the ability to implement features as modular applications and compose them "on demand" when setting up live sessions, that is, consistently with the feature subscriptions preconfigured by participants.The architectural style commonly found in platforms that are based on the Session Initiation Protocol (Rosenberg, Schulzrinne, Camarillo, Johnston, Peterson, Sparks, Handley, & Schooler, 2002;Sparks, 2007) notably, the Internet Multimedia Subsystem (Poikselka, Mayer, Khartabil, & Niemi, 2006), consists of chaining applications between end-points.In this context, a personali-sation approach consists of exposing catalogues of call-control features to subscribers and letting them select and sequence the features of their choice.
Not all sequences of features are acceptable, however, due to the possible occurrence of feature interactions (Calder, Kolberg, Magill, & Reiff-Marganiec, 2003).A feature interaction is "some way in which a feature modifies or influences the behaviour of another feature in generating the system's overall behaviour" (Bond, Cheung, Purdy, Zave, & Ramming, 2004).For instance, a do-not-disturb feature will block any incoming call and cancel the effect of any subsequent feature subscribed by the callee.This is an undesirable interaction: as shown in Figure 1, the call originating from caller X will never reach call-logging feature of callee Y.However, if call-logging is placed before do-not-disturb then both features will play their role.Distributed Feature Composition (dfc) provides a method and a formal architecture to address feature interactions (Jackson & Zave, 1998, 2003;Bond et al., 2004).The method consists of constraining the selection and sequencing of features by prescribing constraints that prevent undesirable interactions.These feature interaction resolution constraints are represented in a catalogue as precedence or exclusion constraints.A precedence constraint, i ≺ j, means that if the features i and j are part of the same sequence then i must precede j.An exclusion constraint between i and j means that they cannot be together in any sequence.Note that an exclusion constraint between i and j can be expressed as a pair of two precedence constraints i ≺ j and j ≺ i. Undesirable interactions are then avoided by rejecting any sequence that does not satisfy the catalogue precedence constraints.
Informally, a feature subscription is defined by a set of features, a set of precedence constraints specified by a user and a set of precedence constraints prescribed by the feature catalogue.The task is to find a sequence of the user-selected features subject to the catalogue precedence constraints and the user-specified precedence constraints.It may not always be possible to construct such a sequence, in which case the task is to find a relaxation of the feature subscription that is consistent and closest to the initial requirements of the user (Lesaint, Mehta, O'Sullivan, Quesada, & Wilson, 2008b).In this paper, we show that checking the consistency of a feature subscription is polynomial in time, but finding an optimal relaxation of a subscription, when inconsistent, is NP-hard.
We present several formulations of finding an optimal relaxation of a feature subscription using constraint programming.We present a simple constraint optimisation problem formulation of our problem and investigate the impact of maintaining three different levels of consistency on decision variables within depth-first branch and bound.The first one is arc consistency (Rossi, van Beek, & Walsh, 2006a), which is commonly used.The second is singleton arc consistency and the third is restricted singleton arc consistency (rsac).We also present a formulation of our problem based on a soft global constraint, which we call SoftPrec (Lesaint, Mehta, O'Sullivan, Quesada, & Wilson, 2009).We further present a formulation based on the weighted constraint satisfaction problem framework (Rossi, van Beek, & Walsh, 2006b).We also consider partial weighted maximum satisfiability (Biere, Heule, van Maaren, & Walsh, 2009), and mixed integer linear programming.We present the formulations using these approaches and discuss their differences with respect to the constraint programming formulations.
Notice that finding an optimal relaxation of a feature subscription is a generalisation of the well-known feedback vertex set problem as well as the feedback arc set problem (Garey & Johnson, 1979).Given a directed graph G = V, E with set of vertices V and set of edges E, the feedback vertex (arc) set problem is to find a smallest V ⊆ V (E ⊆ E) whose deletion makes the graph acyclic.Although in this paper we focus only on a particular telecommunication problem, the techniques studied here are also applicable to other domains where the feedback vertex/arc set problem is encountered, e.g., circuit design, deadlock prevention, vlsi testing, stabilization of synchronous systems (Festa, Pardalos, & Resende, 1999, Section 5).There are also applications in chemistry when it comes to sorting a list of samples of complex mixtures according to their compositions in the presence of missing data, i.e., when not all components are measured in all samples (Fried, Hordijk, Prohaska, Stadler, & Stadler, 2004).
The remainder of this paper is organised as follows.Section 2 presents the necessary background required for this paper.We introduce the notion of feature subscription in Section 3. In Section 4 we reformulate the original problem in order to relate it more easily to well-known problems existing in the literature.In Section 5 we present an algorithm for dealing with symmetries introduced when the original subscription is reformulated.We introduce the notion of relaxation of an inconsistent subscription in Section 6 and prove that finding an optimal relaxation of an inconsistent subscription is NP-Hard.In Section 7 we model the problem of finding such an optimal relaxation as a constraint optimisation problem.In Section 8, we present two other constraint programming approaches based on the notions of global constraints and weighted constraint satisfaction problems.In Sections 9 and 10, the partial weighted maximum satisfiability and mixed integer linear programming formulations of the problem are described.The empirical evaluation of all these approaches is shown in Section 11.Finally our conclusions and future directions are presented in Section 12.

Background
In this section we present a set of concepts on binary relations and constraint programming that will be used in the next sections.

Binary Relations
A binary relation over a finite set X is an association of elements of X with elements of X.Let R be a binary relation over a finite set X. A relation R on a set X is irreflexive if and only if there is no x ∈ X such that x, x ∈ R. A relation R on a set X is transitive if and only if for all x, y and z in X, The transitive closure of a binary relation R on a set X is the smallest transitive relation on X that contains R. We use the notation R * to denote the transitive closure of R. A relation R on a set X is asymmetric if and only if for all x, y in X, [ x, y ∈ R] ⇒ [ y, x ∈ R].A relation R on a set X is total if and only if for any x, y in X, either x, y ∈ R or y, x ∈ R. A strict partial order is a binary relation that is irreflexive and transitive.A strict total order is a binary relation that is transitive, asymmetric and total.The transpose of a relation R, denoted R, is the set { y, x | x, y ∈ R}.The restriction of R on the set Y , denoted R↓ Y , is the set { x, y ∈ R|{x, y} ⊆ Y }.Any binary relation R on set X can also be viewed as a directed graph where the nodes correspond to the elements in X and ordered pairs in R correspond to the edges of the graph.

Constraint Programming
Constraint Programming (cp) has been successfully used in many applications such as planning, scheduling, resource allocation, routing, and bio-informatics (Wallace, 1996).Problems are primarily stated as a Constraint Satisfaction Problems (csps), that is a finite set of variables with finite domains, together with a finite set of constraints.A solution of a csp is an assignment of a value to each variable such that all constraints are satisfied simultaneously.The basic approach for solving a csp instance is to use a backtracking search algorithm that interleaves two processes: constraint propagation and labelling.Constraint propagation helps in pruning values that cannot lead to a solution of the problem.Labelling involves assigning values to variables that may lead to a solution.
A binary constraint is said to be arc consistent if for every value in the domain of every variable, there exists a value in the domain of the other such that the pair of values satisfies the constraint between the variables.A non-binary constraint is generalised arc consistent if and only if for any value for a variable in its scope, there exists a value for every other variable in the scope such that the tuple satisfies the constraint (Rossi et al., 2006a).A csp is said to be Arc Consistent (ac) if all its constraints are (generalised) arc consistent.A csp is said to be Singleton Arc Consistent (sac) if it has non-empty domains and for any assignment of a variable the resulting subproblem can be made ac (Bessiere, Stergiou, & Walsh, 2008).Mixed consistency means maintaining different levels of consistency on different variables of a problem.It has been shown that maintaining sac on some variables and ac on the remaining variables of certain problems, such as job shop scheduling and radio link frequency assignment, can reduce the solution time (Lecoutre & Patrick, 2006).
Various generalisations of csps have been developed, where the objective is to find a solution that is optimal with respect to certain criteria such as costs, preferences or priorities.One of the most significant is the Constraint Optimisation Problem (cop).Here the goal is to find an optimal solution that either maximises or minimises an objective function depending upon the problem.The simplest cop formulation retains the csp limitation of allowing only hard constraints but adds an objective function over the variables.
A depth-first branch and bound search algorithm is generally used to find a solution of a cop having an optimal value.In the case of maximisation, branch and bound search algorithm keeps the current optimal value of the solution while traversing the search tree.This value is a lower bound on the optimal value of the objective function.At each node of the search tree, the search algorithm computes an overestimation of the global value.This value is an upper bound on the best solution that extends the current partial solution.If the lower bound is greater than or equal to the upper bound, then a solution of a greater value than the current optimal value cannot be found below the current node, so the current branch is pruned and the algorithm backtracks.

Configuring Feature Subscriptions
In Distributed Feature Composition (dfc) each feature is implemented by one or more modules called Feature Box Types (fbt) and each fbt has many run-time instances called feature boxes.For simplicity, in this paper we assume that each feature is implemented by a single feature box and we associate features with feature boxes.Dfc establishes a dialogue between endpoints by routing a set-up request encapsulating source and target addresses that are associated with source and target feature boxes respectively.Addresses may change along the way and dfc routers evolve the connection path accordingly.Starting from the feature box initiating the call, feature boxes are incorporated one after the other until a terminating box is reached.A router is used at each step to locate the next box and relay the set-up request.As shown in the third row of Figure 2, the routing method decomposes the connection path into a source and a target region and each region is further partitioned into zones.A source (target) zone is a sequence of feature boxes that execute for the same source (target) address.The first source zone is associated with the source address encapsulated in the initial set-up request, i.e, zone of X in Figure 2. A change of source address in the source region, caused for instance by an identification feature, triggers the creation of a new source zone.If no such change occurs and the zone cannot be expanded further, routers switch to the target region.Likewise, a change of target address in the target region, as performed by Time-Dependent-Routing (tdr) in Figure 2, triggers the creation of a new target zone.If no such change occurs and the zone cannot be expanded further as for Z in Figure 2, the request is sent to the final box identified by the encapsulated target address.
Dfc routers are only concerned with locating feature boxes and assembling zones into regions.They do not make decisions as to the type and ordering of feature boxes appearing in a zone.They simply fetch this information from the pre-configured feature subscription that is associated with the address and region of the zone and use it to construct the zone.For instance, the zone of Z in Figure 2 results from the sequence of feature box types subscribed to by Z in the target region.
Subscriptions are pre-configured from the feature catalogue published by the service provider.The catalogue is a set of features.Features are classified as source, target or reversible (i.e., a subset of features that are both source and target) based on whether they can be subscribed to in the source region, the target region or both.For instance, the catalogue shown in the first row of Figure 2 includes Originating-Call-Screening (ocs) as a source feature, Terminating-Call-Screening (tcs), Time-Dependent-Routing (tdr), and Call-Forwarding-Unconditional (cfu) as target features, and Call-Logging (cl) as a reversible feature.A source feature is activated on behalf of a caller while a target feature is activated on behalf of a callee.
Constraints are formulated by designers on pairs of source features and pairs of target features to prevent undesirable feature interactions (Zave, 2003).A precedence constraint imposes a routing order between two features.The order is specified with respect to the direction of an outgoing call if the features are source (e.g., ocs must precede cl in Figure 2) and with respect to the direction of an incoming call if the features are target (e.g., cl must precede tcs).An exclusion constraint makes two features mutually exclusive, as for the case of cl and cfu in Figure 2. We encode an exclusion constraint between two features f i and f j as the pair of precedence constraints f i ≺ f j and f j ≺ f i .For the sake of simplicity, we treat precedence constraints as ordered pairs, i.e., the precedence constraint f i ≺ f j is also viewed as f i , f j .
Definition 1 (Catalogue).A catalogue is a tuple F s , H s , F t , H t where: • F s is the finite set of source features, • F t is the finite set of target features, • F s ∩ F t is the finite set of reversible features, • H s is the set of source precedence constraints over F s , and The source (target) subscription associated with an address is a subset of source (target) catalogue features, a set of catalogue precedence constraints between source (target) features, and a set of user precedence constraints between source (target) features.For instance, the target subscription of Y shown in the second row of Figure 2 includes the target features tdr and tcs and the user precedence tdr ≺ tcs meaning that tdr should appear before tcs in the connection path.
Definition 2 (Feature Subscription).Given a catalogue F s , H s , F t , H t , a feature subscription is defined to be a pair of tuples S s = F s , H s , P s and S t = F t , H t , P t where: • F s and F t are the user selected source and target features respectively such that F s ⊆ F s , F t ⊆ F t and F s ∩ F t = F t ∩ F s , i.e., any reversible feature in F s ∪ F t appears in both F s and F t ; • H s is the set of source catalogue precedence constraints in F s given by • H t is the set of target catalogue precedence constraints in F t given by • P s is the set of source user precedence constraints over F s , which satisfies • P t is the set of target user precedence constraints over F t , which satisfies Configuring a feature subscription involves selecting, parameterising and sequencing features in each region consistently with the catalogue constraints and other integrity rules (Jackson & Zave, 2003).In particular, the source and target regions of a subscription must include the same reversible features in inverse order, i.e. source and target regions are not configured independently.
Definition 3 (Consistency of Feature Subscriptions).We say that a feature subscription S = F s , H s , P s , F t , H t , P t is consistent if and only if there exists a strict total order T s on F s and a strict total order T t on F t such that The following configuration services may be provided to users submitting a feature subscription: • (verification) Check the consistency of the subscription.
• (filtering) If the feature subscription is consistent, then compute its anti-subscription, i.e., the set of features and precedence constraints that would make it inconsistent if added.
• (partial completion) If the feature subscription is consistent, then compute the transitive closure of each region, i.e., (H s ∪ P s ) * and (H t ∪ P t ) * .
• (completion) If the feature subscription is consistent, then compute a pair of strict total orders on source and target features such that points 1, 2 and 3 of Definition 3 are respected.
• (relaxation) If the feature subscription is inconsistent, then suggest consistent subscriptions obtained out of it by removing one more features or user precedences.
We formalise these tasks in the next section and describe their time complexities after reformulating the original definition of feature subscription.

Reformulating the Original Definition of Feature Subscription
By definition, a catalogue includes two sets of features and two sets of precedence constraints.In this section, we reformulate a catalogue by merging its source and target feature sets and by merging its source and target precedence sets.We transform feature subscriptions accordingly and show that the consistency of a subscription is equivalent to the acyclicity of its transformation.The new definitions are simpler and this reformulation allows us to establish relations with the other well-known problems existing in the literature.
The principle of the reformulation of a catalogue is to inverse and merge the target precedences with the source precedences.Specifically, a catalogue The definitions of (consistent) feature subscription are adapted as follows.
Definition 4 (Feature Subscription).A feature subscription S of catalogue F c , H c is a tuple F, H, P , where F ⊆ F c , H = H c ↓ F , and P is a set of (user defined) precedence constraints on F .
Definition 5 (Consistency of the Reformulated Feature Subscription).A feature subscription F, H, P of a catalogue F c , H c is defined to be consistent if and only if there exists a total order T on F such that T ⊇ H ∪ P .
s by transitivity of T o s .But then we still have a cycle if we omit f i , which contradicts the minimality of the cycle length k.We have shown, for all i ≥ 1, that f i ∈ F o t and so Proposition 2 (Complexity of Consistency Checking).Determining whether a feature subscription F, H, P is consistent or not can be checked in Proof.We use Topological Sort (Cormen, Leiserson, & Rivest, 1990).In Topological Sort we are interested in ordering the nodes of a directed graph such that if a directed edge i, j is in the set of edges of the graph then node i is less than node j in the order.In order to use Topological Sort for detecting whether a feature subscription is consistent, we associate nodes with features and edges with precedence constraints.Then, the subscription is consistent if and only if for all edges i, j in the graph associated with the subscription, i precedes j in the order computed by Topological Sort.As the complexity of Topological Sort is linear with respect to the size of the graph (i.e., the sum of the number of nodes and the number of edges of the graph) detecting whether a feature subscription is consistent is Definition 7 (Anti-subscription).Given a catalogue F c , H c and a consistent feature subscription S = F, H, P , the anti-subscription is the tuple F a , P a defined as follows.f ∈ F c is an element of F a if and only if the directed graph associated with the subscription obtained after adding feature f , i.e., F ∪ {f }, H c ↓ F ∪{f } ∪ P , is cyclic; ∀ i, j ∈ F , i ≺ j is in P a if and only if the directed graph associated with the subscription obtained after adding precedence i ≺ j, i.e., F ∪ {i, j}, The definition of anti-subscription suggests one way of computing the anti-subscription of a given subscription.In order to test whether a feature/precedence belongs to the antisubscription we check the consistency of the resulting subscription.As there are O(|F c |) Definition 8 (Partial Order of a Consistent Subscription).Given a consistent subscription F, H, P , the partial order of the subscription is the transitive closure (H ∪ P ) * of the relation H ∪ P .
The worst-case complexity of finding this transitive closure is O(|F | 3 ).
Definition 9 (Total Order of a Consistent Subscription).A total order of consistent subscription S is a topological sort of the directed graph F, H ∪ P , i.e., a total order extending the relation H ∪ P .
The worst-case complexity of finding such a total order is linear in time with respect to the size of the corresponding graph.

Symmetry Inherent in the Reformulation
One of the services provided to an end-user when configuring a feature subscription is the computation of all compatible pairs of total orders on source and target features.In this section, we show that when an original subscription, as defined in Section 3, is reformulated, as described in Section 4, symmetries are introduced.Two total orders in the reformulated subscription are symmetric if they correspond to the same pair of total orders (on source and target features) in the original subscription.More formally, let be a subscription of the catalogue F s , H s , F t , H t , and S r = F r , H r , P r be the corresponding subscription of the catalogue , and P r = P o s ∪ P o t .A pair of total orders T s , T t is compatible with S o if Conditions (1), ( 2) and (3) of Definition 3 hold.There is a many-to-one relation between the set of total orders of S r (see Definition 9) and the set of compatible pairs of total orders of S o .
Let us consider the subscription S o where s and P o t are empty.The corresponding S r would have F r = {1, 2, 3, 4}, H r = {1 ≺ 2, 3 ≺ 4}, and P r = ∅.Both S o and S r are consistent.The set of total orders of S r , and the set of compatible pairs of total orders of S o are shown in Table 1.The cardinality of the former set is six, while for the latter is only five.The last two total orders of S r correspond to the last compatible pair of total orders of S o .This is due to the fact that the union of a total order on source features and the transpose of a total order on target features in S o is not necessarily a total order.For example for the last pair of total orders of S o in Table 1, the union of 3 ≺ 1 ≺ 2 and 3 ≺ 4 ≺ 2 do not result in a total order, since there is no order between 1 and 4.
The repetition of the computation of the symmetric pairs of total orders of the original subscription from the total orders of the reformulated subscription is not desirable.In order to compute a compatible pair of total orders only once, we use the algorithm GetSolutions(S r ), as shown in Algorithm 1.This algorithm has two nested loops.In the first loop it selects a total order on the set of reversible features and then extends this total order to generate a set of total orders on source features and a set of total orders on target features.In the second loop a total order on source features and a total order on Table 1: Total orders on F r , F o s , and target features are selected from the previously generated sets.Due to the fact that the source features and the target features are ordered independently in GetSolutions(S r ), no unnecessary ordering is imposed between the source features and the target features.
Algorithm 1 GetSolutions(S r ) Require: t is the set of target features, and • GetTotalOrders( F, O ) generates the set of all total orders that extend a given acyclic binary relation O defined on a set of features F .
• , R , S , and T are set to (H r ∪ P r ) * , ↓ F o r , ↓ F o s , and ↓ F o t respectively.Ensure: PTOs is the set of pairs of compatible total orders on F o s and F o t respectively.
for all s ∈ STOs, t ∈ TTOs do 7: The algorithm computes and saves all total orders on a given set of reversible features in RT Os, and for a given total order on the set of reversible features it computes and saves all the total orders on source and target features in ST Os and T T Os respectively.However, this is presented in the algorithm for the purpose of clarity.In practice, a total order is computed lazily, i.e., a total order is only computed when is needed, thus avoiding the need of keeping all the total orders generated in memory.
The amortised time complexity of computing all the total orders extending a given acyclic binary relation is linear with respect to the number of total orders (Pruesse & Ruskey, 1994).Assuming that there are τ r total orders on F o r and at most τ s , and τ t total orders on F o s and F o t that are consistent with a given total order on F o r respectively, the time complexity of GetSolutions is O(τ r × τ s × τ t ).The computation of all the pairs of compatible total orders could be impractical when the size of the resulting set is very large.Therefore, in those cases the computation of the number of total orders could be restricted to a pre-specified number, and a heuristic can be used to select r in Line 3, and s and t in Line 6 of Algorithm 1.
There may be some pairs of total orders on F o s and F o t that are more desirable than others.For instance, it would be more desirable to present an end-user those pairs of total orders that are more easy to extend (in terms of the addition of a feature or a user precedence).One way of doing this is to use the notion of anti-subscription (see Definition 7).Each pair of total orders can be associated with an anti-subscription.The size of the anti-subscription is the sum of the number of features and precedences that are involved in it.The pairs of total orders can be ordered in the increasing size of their corresponding anti-subscriptions.The size of an anti-subscription in some sense reflects how constrained a pair of total orders is with respect to the future addition of the number of features and user precedences that an end-user may consider in his/her subscription in the future.

Relaxations of Feature Subscriptions
If an input feature subscription is not consistent then the goal is to relax it by dropping one or more features or user precedence constraints to generate a consistent feature subscription that is closest to the initial user's requirements.Therefore, we introduce a function w : F ∪ P → N that assigns weights to features and user precedence constraints, indicating the importance to the user of the features and user precedences.These weights could be elicited directly through data mining or analysis of user interactions.In the rest of the paper a feature subscription is denoted by S = F, H, P, w .The value of the subscription S is defined by Value(S) = f ∈F w(f ) + ρ∈P w(ρ).
Definition 10 (Relaxation).A relaxation of a feature subscription F, H, P, w of a catalogue F c , H c is a subscription F , H , P , w such that F ⊆ F , H = H↓ F , P ⊆ P ↓ F and w is w restricted to F ∪ P .
Definition 11 (Optimal Relaxation).Let R S be the set of all consistent relaxations of a feature subscription S. We say that S i ∈ R S is an optimal relaxation of S if it has maximum value among all consistent relaxations, i.e., if and only if there does not exist S j ∈ R S such that Value(S j ) > Value(S i ).
Proposition 3 (Complexity of Finding an Optimal Relaxation).Finding an optimal relaxation of a feature subscription is NP-hard.
Proof.Given a directed graph G = V, E , the Feedback Vertex Set Problem is to find a smallest V ⊆ V whose deletion makes the graph acyclic.This problem is known to be NPhard (Garey & Johnson, 1979).We prove that finding an optimal relaxation is NP-hard by a reduction from the feedback vertex set problem.The feedback vertex set problem can be reduced to our problem by associating the nodes of the directed graph V with features F , the edges E with catalogue precedence constraints H.We set P to ∅ and define w by w(f ) = 1, for all f ∈ F .Thus, finding an optimal relaxation of S = F, H, P, w corresponds to finding a biggest set of nodes V such that the deletion of V − V from G results in an acyclic graph.Therefore, we conclude that finding an optimal relaxation of an inconsistent subscription is NP-hard.
The most challenging operation on feature subscriptions is to find an optimal relaxation of a subscription that is not consistent, since it is NP-Hard.In the remainder of the paper we focus only on this particular task.

Basic COP Model for Finding an Optimal Relaxation
In this section we model the problem of finding an optimal relaxation of a feature subscription F, H, P, w of catalogue F c , H c as a constraint optimisation problem (Lesaint, Mehta, O'Sullivan, Quesada, & Wilson, 2008c).
Variables and Domains.We associate each feature i ∈ F with two variables: a Boolean variable bf i and an integer variable pf i .A Boolean variable bf i is instantiated to 1 or 0 depending on whether feature i is included in the subscription or not, respectively.The domain of each integer variable pf i is {1, . . ., |F |}.Assuming that the computed subscription is consistent, an integer variable pf i corresponds to the position of the feature i in a sequence, which is consistent with the optimal relaxation.We associate each user precedence constraint (i ≺ j) ∈ P with a Boolean variable bp ij .A Boolean variable bp ij is instantiated to 1 or 0 depending on whether i ≺ j is respected in the computed subscription or not, respectively.A variable v is associated with the value of the subscription, the initial lower bound of which is 0 and the initial upper bound is the sum of the weights of all the features and user precedences.
Constraints.A catalogue precedence constraint (i ≺ j) ∈ H that feature i should be before feature j can be expressed as follows: Note that the constraint is activated only if the selection variables bf i and bf j are instantiated to 1.A user precedence constraint (i ≺ j) ∈ P that i should be placed before j in their subscription can be expressed as follows: Note that if a user precedence constraint holds then the features i and j are included in the subscription and also the feature i is placed before j, that is, the selection variables bf i and bf j are instantiated to 1 and pf i < pf j is true.
The value of the subscription is equal to the sum of the weights of the included features and included user precedences.This constraint can be expressed as the following: (1) Enforcing arc consistency on Equation (1), in general, is exponential (Zhang & Yap, 2000).Therefore, cp solvers perform only bounds consistency on this constraint, which is equivalent to enforcing arc consistency on the the following pair of constraints, which can be seen as a decomposition of Equation ( 1): In order to reason about the complexities of enforcing different consistency techniques we always assume that the two inequality constraints are used instead of the equality constraint.
Objective.The objective is to find an optimal relaxation of a feature subscription.
We have investigated the impact of maintaining three different levels of consistency within branch and bound search.The first is arc consistency and the rest are mixed consistencies.In the following sections we shall describe these consistency techniques and present their worst-case time complexities when enforced on any instance of feature subscription, if formulated as described above.The results for the complexities that are presented below are based on the assumption that only the Boolean variables associated with the inclusion/exclusion of features and user precedences are the decision variables.We remark that if the problem is arc-consistent after instantiating all the Boolean variables then it is also globally consistent.

Arc Consistency
Let e be the sum of the number of user precedences and the number of catalogue precedences, let n be the sum of the number of features and the number of user precedences, and let d be the number of features.The complexity of achieving arc consistency (ac) on a (catalogue/user) precedence constraint is constant with respect to the number of variables.A catalogue precedence constraint is made arc-consistent when any of the Boolean variables involved in the constraint is initialised or any of the domains of the position variables is modified.Thus, a catalogue precedence constraint can be made arc-consistent at most (1 + 1 + (d − 1) + (d − 1)) times, which is effectively 2d times.A user precedence constraint can be made arc-consistent at most 2d + 1 times.Since there are, in total, e precedence constraints, the worst-case time complexity of imposing arc consistency on all the precedence constraints is O(e d), which is also optimal.In addition, arc consistency is also enforced on the linear inequalities (2) and (3), the complexity of which is linear with respect to the number of Boolean variables.Whenever a Boolean variable is instantiated the constraint is revised and since there are n Boolean variables, it can be made arc-consistent at most n times.Therefore, the worst-case time complexity of enforcing arc consistency on the linear inequalities is O(n 2 ), which is optimal.Thus, the worst-case time complexity of enforcing ac on an instance of basic cp model for finding an optimal relaxation is O(e d + n 2 ).

Singleton Arc Consistency
Maintaining a higher level of consistency can be expensive in terms of time.However, if more values can be removed from the domains of the variables, the search effort can be reduced and this may save time.We shall investigate the effect of maintaining Singleton Arc Consistency (sac) on the Boolean variables and ac on the remaining variables and denote it by sac b .We have used the sac-1 (Debruyne & Bessiere, 1997) algorithm for enforcing sac on the Boolean variables.Enforcing sac on the Boolean variables in a sac-1 manner works by traversing a list of 2n variable-value pairs.For each instantiation of a Boolean variable x to each value 0/1, if there is a domain wipeout while enforcing ac then the value is removed from the corresponding domain and ac is enforced.Each time a value is removed, the list is traversed again.Since there are 2n variable-value pairs, the number of calls to the underlying arc consistency algorithm is at most 4n 2 .Thus the worst-case time complexity of sac b is O(n 2 (e d + n 2 )).sac b does not have an optimal worst-case time complexity.In sac b arc consistency can be enforced on a subproblem obtained by restricting a Boolean variable to a single value at most 2n times, and each time arc consistency is established from scratch.However, one can take the incremental property of arc consistency into account to obtain an optimal version of sac b .Following the work of Lecoutre (2009) an arc consistency algorithm is said to be incremental if and only if its worst-case time complexity is the same when it is applied once on a given network P and when it is applied up to m times on P where between any two consecutive executions, at least one value has been deleted.Here m is the sum of the domain sizes of all the variables involved in the problem P .The idea behind an optimal version is that we do not want to achieve arc consistency from scratch in each subproblem, but, instead, benefit from the incremental property of the underlying arc consistency algorithm.This results in the asymptotic complexity of O(e d + n 2 ) for enforcing arc consistency 2n times.Thus, the time complexity of an optimal version of sac b would be O(n (e d + n 2 )).

Restricted Singleton Arc Consistency
The main problem with sac-1 is that deleting a single value triggers the loop again.The Restricted Singleton Arc Consistency (rsac) avoids this by considering each variable-value pair only once (Prosser, Stergiou, & Walsh, 2000).We investigate the effect of enforcing (rsac) on the Boolean variables and ac on the remaining variables, and denote it by rsac b .The worst-case time complexity of rsac b is O(n (e d + n 2 )).

Other CP Models
In this section we present two more cp approaches.The first approach uses a global constraint that achieves a higher level of consistency by taking into account the cycles of the precedence constraints.In the second approach we model the problem as a weighted constraint satisfaction problem.

Global Constraint
A global constraint captures a relation between several variables.It takes into account the structure of the problem to prune more values.For instance, if a user has selected a set of features, F = {1, 2, 3, 4} and if these features are constrained by the catalogue precedences 1 ≺ 2, 2 ≺ 1, 3 ≺ 4 and 4 ≺ 3, and if three features are required to be included in the subscription then one can infer that the problem is inconsistent without doing any search.This is possible by inferring cycles from the precedence constraints and using them to prune the bounds of the objective function.
The soft global precedence constraint SoftPrec was proposed by Lesaint et al. (2008a).It holds if and only if there is a strict partial order on the selected features subject to the relevant hard (catalogue) precedence constraints and the selected soft (user) precedence constraints, and the value of the subscription is within the provided bounds.As shown by Lesaint et al. (2008a), achieving ac for SoftPrec is NP-complete since there is no way to determine in polynomial time whether there is a strict partial order whose value is between the given bounds.Therefore, ac is approximated by pruning the domains of the variables based on the filtering rules that follow from the definition of SoftPrec.The time-complexity for achieving this pruning is O(|F | 3 ), which is polynomial.The upper bound of the value of the subscription is pruned based on the incompatibilities that are inferred between pairs of features, and the dependencies between user precedences and their corresponding features.The pruning rules of SoftPrec are used within branch and bound search to find an optimal relaxation of a feature subscription.
Let F, H, P, w be a subscription.Let bf be a vector of Boolean variables associated with F .We say that feature i is included if bf(i) = 1, and i is excluded if bf(i) = 0. We abuse the notation by using bf(i) to mean bf(i) = 1, and ¬bf(i) to mean bf(i) = 0.A similar convention is adopted for the other Boolean variables.Let bp be a |F | 2 matrix of Boolean variables.Here bp is intended to represent a strict partial order on the included features F which is compatible with the catalogue constraints restricted to F .
Definition 12 (SoftPrec).Let S = F, H, P, w be a feature subscription, bf and bp be vectors of Boolean variables, and v be an integer variable, SoftPrec(S, bf, bp, v) holds if and only if 1. bp is a strict partial order restricted to bf, i.e., ∀i, j ∈ F : bp The set of constraints in this cp model only contains SoftPrec.The decision variables in this model are bf and bp.A solution of SoftPrec is a consistent relaxation of the subscription F, H, P, w .Notice that the feedback vertex set problem (Garey & Johnson, 1979) can be expressed in terms of SoftPrec by associating vertices with features and arcs with catalogue precedence constraints.Therefore, achieving generalised arc consistency on SoftPrec is NP-hard.

Weighted CSP Model
The classical csp framework has been extended by associating weights (or costs) with tuples (Larrosa, 2002).The Weighted Constraint Satisfaction Problem (wcsp) is a specific extension that relies on a specific valuation structure S(k) defined as follows.
A wcsp instance is defined by a valuation structure S(k), a set of variables (as for classical csp instances) and a set of constraints.A domain is associated with each variable and a cost function with each constraint.More precisely, for each constraint C and each tuple t that can be built from the domains associated with the variables involved in C, a value in {0, 1, . . ., k} is assigned to t.When a constraint C assigns the cost k to a tuple t, it means that C forbids t.Otherwise, it is permitted by C with the corresponding cost.The cost of an instantiation of variables is the sum (using operator ⊕) over all constraints involving variables instantiated.An instantiation is consistent if its cost is strictly less than k.The goal of the wcsp problem is to find a full consistent assignment of variables with minimum cost.A wcsp formulation for finding an optimal relaxation of the input subscription F, H, P, w , when inconsistent, is outlined below.
The maximum acceptable cost is We associate each feature i ∈ F with an integer variable pf i .The domain of each integer variable, D(pf i ), is {0, . . ., |F |}.If pf i is instantiated to 0, it indicates that i is excluded from the subscription.
A unary cost function C i : D(pf i ) → {0, w(i)} assigns costs to assignments of variable pf i in the following way: , k} that assigns costs to assignments of variables pf i and pf j in the following way: A user precedence constraint (i ≺ j) ∈ P is associated with a binary cost function P i≺j : D(pf i ) × D(pf j ) → {0, w(i ≺ j)} assigns costs to assignments of variables pf i and pf j in the following way: Note that if a user precedence constraint holds then the features i and j are included in the subscription and also the feature i is placed before j, that is, the integer variables pf i and pf j are instantiated to any value greater than 0 and pf i < pf j is true.

Boolean Satisfiability
The Boolean Satisfiability Problem (sat) is a decision problem an instance of which is an expression in propositional logic.The problem is to decide whether there is an assignment of true and false values to the variables that will make the expression true.The expression is normally written in conjunctive normal form.The Partial Weighted Maximum Boolean Satisfiability Problem (pwmsat) is an extension of sat that includes the notions of hard and soft clauses.Any solution should respect the hard clauses.Soft clauses are associated with weights.The goal is to find an assignment that satisfies all the hard clauses and minimises the sum of the weights of the unsatisfied soft clauses.In this section we present Boolean satisfiability formulations for finding an optimal relaxation of a feature subscription.

Atom-based Encoding
In an atom-based encoding, each atom, like f ≺ g, is associated with a propositional variable and the asymmetricity and transitivity properties of the precedence relation are explicitly encoded.An atom-based encoding of finding an optimal relaxation of a feature subscription F, H, P, w is outlined below.
Variables.Let PrecDom be the set of possible precedence constraints that can be defined on F , i.e., {i ≺ j : {i, j} ⊆ F ∧ i = j}).For each feature i ∈ F there is a Boolean variable bf i , which is true or false depending on whether feature i is included or not in the computed subscription.For each precedence constraint (i ≺ j) there is a Boolean variable bp ij , which is true or false depending on whether the precedence constraint holds or not in the computed subscription.If bp ij is true, then, roughly speaking, it means that features i and j are included, and i precedes j.
Clauses.Each weighted-clause is represented by a tuple w, c , where w is the weight of the clause c.Note that the hard clauses are associated with weight , which represents an infinite penalty for not satisfying them.
Each catalogue precedence constraint, (i ≺ j) ∈ H, must be satisfied if the features i and j are included in the computed subscription.This is modelled by adding the following hard clause: , The precedence relation should be transitive and asymmetric in order to ensure that the subscription graph is acyclic.To ensure asymmetricity, the following clause is added for every pair {i ≺ j, j ≺ i} ⊆ PrecDom: Both bp ij and bp ji can be false.However, if one of them is true the other one should be false.
To ensure transitivity, for every {i ≺ j, j ≺ k} ⊆ PrecDom, the following clause is added: Note that Rule (5) need only be applied to i, j, k such that i = k since precedence constraints are not reflexive because of Rule (4).
Each precedence constraint (i ≺ j) ∈ PrecDom is only satisfied when its corresponding features i and j features are included.This is ensured by considering the following clauses: We need to penalise any solution that does not include a feature i ∈ F or a user precedence constraint (i ≺ j) ∈ P .This is done by adding the following clauses: The cost of violating these clauses is the weight of the feature i and the weight of the user precedence constraint i ≺ j respectively.
Reducing the Variables and Clauses.It is straightforward to realise that the atom based encoding described in the previous section requires Θ(n 2 ) Boolean variables and Θ(n 3 ) clauses, where n is the number of features1 .We now describe two techniques which can reduce the number of variables and clauses.The subscription contains a cycle if and only if the transitive closure of H ∪ P contains a cycle.Therefore, instead of associating a Boolean variable with each possible precedence constraint, it is sufficient to associate Boolean variables only with the precedence constraints in the transitive closure of H ∪ P .
Reducing the Boolean variables will also reduce the transitive clauses, especially when the input subscription graph is not dense.Otherwise, Rule ( 5 4) is acyclic, the 120 transitivity clauses and 12 asymmetricity clauses are redundant.Thus, if PrecDom is instead set to be the transitive closure of H ∪ P , then Rules (4) and ( 5) would not generate any redundant clauses.We further reduce the number of transitivity clauses , (¬bp ij ∨ ¬bp jk ∨ bp ik ) by considering only those where none of j ≺ i, k ≺ j, and i ≺ k are in H, especially when the input subscription graph is not sparse.The reason for this is that these transitivity clauses are always entailed due to the enforcement of the catalogue precedence constraints.This reduction in the number of clauses might reduce the memory requirement and also might have an impact on the efficiency of unit propagation, which in turn may reduce the runtime.

Symbol-based Encoding
Another sat approach based on a symbol-based encoding of partial order constraints is presented by Codish et al. (2009).Partial order constraints (Codish, Lagoon, & Stuckey, 2008) are basically propositional formulae except that propositions can also be statements about a partial order on a finite set of symbols.In a symbol-based encoding the transitivity and asymmetricity properties of a precedence relation are enforced implicitly.
Here also a Boolean variable bf i is associated with each feature i ∈ F indicating whether i is included or excluded.A Boolean variable bp ij is associated with each precedence constraint (i ≺ j) ∈ H ∪ P .For each catalogue constraint (i ≺ j) ∈ H the following clause is added: , (¬bf i ∨ ¬bf j ∨ bp ij ) .For each precedence constraint i ≺ j ∈ (H ∪ P ) the following clauses are added: , (¬bp ij ∨ bf i ) and , (¬bp ij ∨ bf j ) .For each precedence constraint i ≺ j ∈ (H ∪ P ) the propositional constraint bp ij ⇒ i ≺ j is encoded2 .This intuitively means that if bp ij is true then i precedes j.Two different ways of encoding a precedence constraint i ≺ j are presented by Codish et al. (2009), which are called the unary encoding and the binary encoding.A brief description of them is presented in Section 9.2.1 and Section 9.2.2, which will provide a basis for their theoretical comparisons.
Advanced techniques for encoding the objective function have also been proposed by Codish et al. (2009).However the encoding of the objective function is orthogonal to the way the precedences are encoded.As our purpose is to compare the encoding of the precedence constraints, we omit the details of the encoding of the objective function for the symbol-based encoding proposed by Codish et al. (2009).Instead, we assume that in this approach the objective function is encoded as it is done in the atom-based case.Therefore, in the pwmsat setting the following soft clauses are added for features and user precedences: w(i), bf i and w(i ≺ j), bp ij .

Unary Encoding
In the symbol-based unary encoding (Codish et al., 2009) each feature is associated with an ordered set of Boolean variables that represents the unary encoding of its position.The unary encoding of a non-negative integer m ≤ n is an assignment of values to a sequence of n Boolean variables m 1 , . . ., m n such that m 1 ≥ m 2 ≥ • • • ≥ m n .The integer-value of such a representation is the number of variables m i taking value 1.For example, the sequence 11100000 represents the number m = 3 using n = 8 variables.For each pair of consecutive variables in the sequence, say m k and m k+1 , a clause , (¬m k+1 ∨m k ) is introduced to the encoding in order to enforce that if m k+1 is assigned 1 then its predecessor in the sequence, m k , must be assigned 1.Let i and j be two non-negative integer variables that can be assigned values less than or equal to n.Let i 1 , . . ., i n and j 1 , . . ., j n be the sequences of n Boolean variables that represent the unary-encodings of i and j respectively.The unaryencoding of i ≺ j is denoted by i 1 , . . ., i n ≺ j 1 , . . ., j n , which means that the number of variables assigned the values 1 in the sequence i 1 , . . ., i n is less than the number of variables assigned the values 1 in the sequence j 1 , . . ., j n .Notice that i 1 , . . ., i n ≺ j 1 , . . ., j n holds if and only if ¬i n holds, j 1 holds, and i 1 , . . ., i n j 2 , . . ., j n , 0 holds.Here j 2 , . . ., j n , 0 encodes an integer between 0 and n − 1, which is the predecessor of j 1 , . . ., j n .The inequality i 1 , . . ., i n j 2 , . . ., j n , 0 can be encoded as follows: Overall, the symbol-based unary encoding requires Θ(n 2 ) propositional variables (n per feature) and involves Θ(k n) clauses (n per precedence constraint), where k = |H ∪ P |.

Binary Encoding
In the symbol-based binary encoding each feature is associated with an ordered set of Boolean variables that represents the binary log encoding of its position.The binary encod-ing of a non-negative integer a ≤ n is a sequence of values assigned to k variables v 1 , . . ., v k , where k = log 2 n .The value of such a representation is 1≤m≤k 2 k−m × v m .For example, the sequence 101 represents the number 5 using 3 variables.A precedence constraint is encoded using a lexicographical comparator (Apt, 2003).Given two numbers in binary encoded form i 1 , . . ., i k and j 1 , . . ., j k , a precedence constraint i 1 , . . ., i k < j 1 , . . ., j k holds if and only if there exists m > 0 such that i m < j m and for all l < m, i l = j l .The resulting encoding is not in conjunctive normal form.Therefore, the Tseitin transformation 3 (Tseitin, 1968) is used to obtain the corresponding formula in conjunctive normal form.For a given precedence constraint, the Tseitin transformation introduces Θ(log n) variables and clauses, since log n is the length of the formula associated with the given precedence constraint.Overall, the symbol-based binary encoding requires Θ(n log n) propositional variables and involves Θ(k log n) clauses, where k = |H ∪ P |.

Comparison of the Encodings
Unit Propagation (up) is a central component of a search-based sat solver.Given a unit clause l, unit propagation applies the following rules: (1) every clause containing l is removed, and (2) ¬l is removed from every clause that contains this literal.These rules are applied until a fixed-point is reached.The application of these two rules leads to a new set of clauses that is equivalent to the old one.Unit propagation detects inconsistency when an empty clause is generated.
Let ae, se u , and se b denote the atom-based encoding, the symbol-based unary encoding, and the symbol-based binary encoding respectively.The difference between these encodings is the way they encode acyclicity.In ae acyclicity is encoded explicitly by adding transitivity and asymmetricity clauses.In se u and se b acyclicity is encoded implicitly by associating each feature with a set of Boolean variables that represent its position (an integer value) and a precedence constraint is expressed in terms of these positions.The Boolean variables denoting the inclusion (or exclusion) of features and user precedences are called problem variables.These variables are common to all the encodings.An optimal relaxation can be expressed in terms of the problem variables.In order to show that unit propagation on one encoding is stronger than unit propagation on another encoding, we need to map the decisions of one encoding to the other one.Unfortunately, it is not possible to map the decisions between the atom-based and the symbol-based encodings.For example, an assignment of a position variable in the symbol-based encodings cannot be expressed in terms of the assignments to the variables of ae.Nevertheless, in the following, we prove that unit propagation in ae is stronger than unit propagation in se b when a set of assignments are restricted to the problem variables.
Proposition 4. Given a set of assignments restricted to the problem variables, if unit propagation detects inconsistency in se b then it also detects inconsistency in ae, but the converse is not true.
Proof.The atom-based and the symbol-based binary encoding differ only on the encoding of the acyclicity, i.e., the encoding of the transitivity and asymmetricity properties of the precedence relation.In the symbol-based binary encoding transitivity and asymmetricity properties are implicitly captured by the clauses corresponding to the propositional constraints of the form bp ij ⇒ i ≺ j .Therefore, in order to prove that if up detects inconsistency in se b then it also detects inconsistency in ae, it is sufficient to show that if bp ij is falsified due to violation of i ≺ j in se b under unit propagation, the same happens in ae.
The clauses corresponding to i ≺ j are not defined in terms of the problem variables and none of these clauses are unary4 .Therefore, up can not falsify bp ij in se b .This trivially implies that, when only a set of problem variables are instantiated, up in ae detects any inconsistency that is detected by up in se b .Now we show that there exists a case where an inconsistency is detected by up in ae but it is not detected in se b .Let F = {i, j, k} be a set of features, H = ∅, and P = {i ≺ j, j ≺ k, k ≺ i} be a set of user precedence constraints.In all the encodings we have a Boolean variable per user precedence constraint: bp ij , bp jk and bp ki and we assume that bp ij , bp jk and bp ki are set to true.In ae the unit resolution of bp ij and bp jk with the transitive clause ¬bp ij ∨ ¬bp jk ∨ bp ik yields bp ik , and the unit-resolution of bp ik with ¬bp ki ∨ ¬bp ik yields ¬bp ki , which results in an empty clause when resolved with bp ki .In se b , an ordered set of Boolean variables is associated with each feature.As there are 3 features, two Boolean variables are required per feature.Therefore each feature i, j and k is associated with i 1 , i 2 , j 1 , j 2 , and k 1 , k 2 respectively that are used to encode a precedence constraint.For each precedence constraint, say i ≺ j, a set of clauses that encode the propositional constraint bp ij ⇒ (¬i 1 ∧ j 1 ) ∨ ((i 1 ⇔ j 1 ) ∧ (¬i 2 ∨ j 2 )) are also added.The formulae associated with j ≺ k and k ≺ i are encoded similarly.Although bp ij and bp jk are set to true, up does not infer ¬bp ik , since none of the clauses obtained by applying Tseitin transformation is unary.Therefore, unlike ae, se b does not detect the inconsistency.
Thus, we can infer that if unit propagation detects inconsistency in se b then it also detects inconsistency in ae, but the converse is not true.
Given a set of assignments restricted to the problem variables, if unit propagation detects inconsistency in se u then it also detects inconsistency in ae, and the converse is also true.This follows directly from the explanation of the symbol-based unary encoding and the atom-based encoding.Notice that both encodings detect cycles consisting of two features of the form i ≺ j and j ≺ i.If the cycles involve more than two features i ≺ j, j ≺ k, k ≺ i both of them will infer i ≺ k which will result in a cycle consisting of two features i and k.

Mixed Integer Linear Programming
In linear programming the goal is to optimise an objective function subject to linear equality and inequality constraints.When some variables are forced to be integer-valued, the problem is called Mixed Integer Linear Programming (mip) problem.The standard way of expressing these problems is by presenting the function to be optimised, the linear constraints to be respected and the domain of the variables involved.Both the basic cop formulation and the atom-based pwmsat formulation for finding an optimal relaxation of a feature subscription F, H, P, w can be translated into a mip formulation.The translation of the pwmsat formulation into mip is straightforward.For this particular formulation we observed that cplex was not able to solve even simple problems within a time limit of 4 hours.In this paper, we only present the mip formulation that corresponds to the basic cop formulation as presented in Section 2.2.
Variables.For each i ∈ F , we use a binary variable bf i and a real variable pf i .A binary variable bf i is equal to 1 or 0 depending on whether feature i is included or not.A real variable pf i , 1 ≤ pf i ≤ |F |, if bf i is set to 1, is used to determine the position of the feature i in the computed subscription.For each user precedence constraint (i ≺ j) ∈ P , we use a binary variable bp ij .It is instantiated to 1 or 0 depending on whether the precedence constraint i ≺ j holds or not.
Linear Inequalities.If the features i and j are included in the computed subscription and if (i ≺ j) ∈ H then the position of feature i must be less than the position of feature j.
To this effect, we need to translate the underlying implication (bf i ∧ bf j ⇒ (pf i < pf j )) into the following linear inequality: Here, n is a constant that is equal to the number of features, |F |, selected by the user.When both bf i and bf j are 1, Inequality (6) will force (pf i < pf j ).Note that this is not required for any user precedence constraint (i ≺ j) ∈ P , since it can be violated.
A user precedence (i ≺ j) ∈ P is equivalent to the implication bp ij ⇒ (pf i < pf j )∧bf i ∧bf j , which in turn is equivalent to the conjunction of the three implications (bp ij ⇒ (pf i < pf j )), (bp ij ⇒ bf i ) and (bp ij ⇒ bf j ).These implications can be translated into the following inequalities: Inequality ( 7) means that bp ij = 1 forces pf i < pf j to be true.Also, if bp ij = 1 then both bf i and bf j are equal to 1 from Inequalities ( 8) and ( 9) respectively.
Objective Function.The objective is to find an optimal relaxation of a feature subscription configuration problem F, H, P, w that maximises the sum of the weights of the features and the user precedence constraints that are selected:

Experimental Results
In this section, we shall describe the empirical evaluation of finding an optimal relaxation of randomly generated feature subscriptions using constraint programming, partial weighted maximum Boolean satisfiability and integer linear programming.

Problem Generation and Experimental Settings
In order to compare the different approaches we generated and experimented with a variety of random catalogues and many classes of random feature subscriptions.All the random selections below are performed with uniform distributions.A random catalogue is defined by a tuple f c , B c , T c .Here, f c is the number of features, B c is the number of binary constraints and T c ⊆ {≺, , ≺ } is a set of types of constraints.Note that i ≺ j means that in any given subscription both features i and j cannot exist together.A random catalogue is generated by selecting B c pairs of features randomly from f c (f c − 1)/2 pairs of features.Each selected pair of features is then associated with a type of constraint that is selected randomly from T c .A random feature subscription is defined by a tuple f u , p u , w .Here, f u is the number of features that are selected randomly from f c features, p u is the number of user precedence constraints between the pairs of features that are selected randomly from f u (f u − 1)/2 pairs of features, and w is an integer greater than 0. Each feature and each user precedence constraint is associated with an integer weight that is selected randomly between 1 and w inclusive.We generated catalogues of the following forms: 50, 250, {≺, } , 50, 500, {≺, , ≺ } and 50, 750, {≺, } .For each random catalogue, we generated classes of feature subscriptions of the following forms : 10, 5, 4 , 15, 20, 4 , 20, 10, 4 , 25, 40, 4 , 30, 20, 4 , 35, 35, 4 , 40, 40, 4 , 45, 90, 4 and 50, 5, 4 .Note that 50, 250, {≺, } is the default catalogue and the value of w is 4 by default, unless stated otherwise.For each catalogue 10 instances of feature subscriptions were generated and their mean results are reported in the paper5 .We remark that only 4 randomly generated instances were consistent out of the 270 generated instances.These consistent instances are instances of the feature subscription class 10, 5, 4 of catalogue 50, 250, {≺, } .
All the experiments were performed on a pc pentium 4 (cpu 1.8 ghz and 768mb of ram) processor.The performances of all the approaches are measured in terms of search nodes (#nodes) and runtime in seconds (time).The time reported is the time spent in both finding the optimal solution and proving optimality.We used the time limit of 14,400 seconds (i.e., 4 hours) to cut the search.No initial bounds were computed for any of the approaches.

Evaluation of Constraint Programming Formulations
For the basic constraint optimisation problem model as presented in Section 7 we first investigated the effect of Maintaining Arc Consistency (mac) within branch and bound search.We also studied the effect of maintaining different levels of consistency on different sets of variables within a problem.In particular we investigated, (1) maintaining singleton arc consistency on the Boolean variables and mac on the remaining variables (see Section 7.2), and (2) maintaining restricted singleton arc consistency on the Boolean variables and mac on the remaining variables (see Section 7.3); the former is denoted by msac b and the latter by mrsac b .All the branch and bound search algorithms were tested with two different variable ordering heuristics: dom/deg (Bessiere & Regin, 1996) and dom/wdeg (Boussemart, Hemery, Lecoutre, & Sais, 2004).Here dom is the domain size, deg is the original degree of a variable, and wdeg is the weighted degree of a variable.All the experiments for the basic constraint optimisation problem formulation were done using choco 6 (version 2.1) a Java library for constraint programming systems.Some results for all the three branch and bound search algorithms with the dom/deg variable ordering heuristic are presented in Table 2 and with the dom/wdeg variable ordering heuristic are presented in Table 3.  Tables 2 and 3 clearly show that maintaining (r)sac on the Boolean variables and ac on the integer variables dominates maintaining ac on all the variables.To the best of our knowledge this is the first time that such a significant improvement has been observed by maintaining a partial form of singleton arc consistency during search.As the problem size increases the difference in terms of the number of nodes visited by mrsac b and msac b increases.Note that mrsac b usually visits more nodes than those visited by msac b , but the difference between them is not that significant.This suggests that the level of consistency enforced by rsac on the instances of feature subscription problem is very close to that enforced by sac.Despite visiting more nodes, mrsac b usually requires less time than msac b .On average, all the three search algorithms perform better with the dom/wdeg heuristic than with the dom/deg heuristic.Note that in the remainder of the paper the results that correspond to the basic cop model are obtained using mrsac b with the dom/wdeg variable ordering heuristic.
We remark that the underlying algorithms in mac and mrsac b that enforce ac and rsac b respectively have an optimal worst-case time complexity.However, the underlying algorithm of msac b that enforces sac b does not have an optimal worst-case time complexity.Implementing an algorithm to enforce sac b that has an optimal worst-case time complexity is not only cumbersome but also has a higher space requirement.The works of Bessiere et al. (2004Bessiere et al. ( , 2005) ) provide evidence that when an optimal algorithm for enforcing sac is used as a preprocessor it is very expensive both in terms of running time and space.Therefore, 6. http://choco.sourceforge.net/maintaining it during search, as in our case, could be even more expensive.Indeed there exists other sub-optimal but efficient algorithms for enforcing singleton arc consistency on constraints networks, as proposed by Lecoutre et al. (2005) and, it remains to see whether any of these efficient algorithms can reduce the running time of msac b .
Notice that sac b can prune more values than rsac b .However, in practice, the difference between their pruning on the instances of feature subscriptions is not much, which is evident based on the number of nodes and time shown in Tables 2 and 3. We recall that rsac b enforces partial sac b .At a given node in the search tree, rsac b enforces arc consistency at most one time for each assignment of a value to each Boolean variable, whereas sac b can enforce arc consistency at most n times in the worst-case.Here n is the sum of the Boolean variables associated with features and user precedences.Nevertheless, in practice, we observed that it was much less.For example, for any instance of feature subscription of the class 40, 40 arc consistency was enforced at most 7 times for any variable-value pair, which is much less than n = 80.This also justifies the use of a non-optimal version of algorithm to enforce sac b .
Our wcsp formulation for finding an optimal relaxation of a feature subscription was also tested.For this purpose toulbar2 (a generic solver for wcsp) was used 7 .In general the results in terms of time were poor.We remark that a solution of the wcsp model is a total order on the features whose position variables are assigned values greater than 0. Due to holes (when a feature is excluded) different assignments of the position variables may lead to the same total order.Thus, more search effort could be spent for the wcsp formulation.We recall that in the basic cop model the decision variables are only the Boolean variables that indicate the inclusion/exclusion of features and user precedences and not the position variables.Therefore, an optimal solution of the basic cop model may not necessarily be a total order on the included features.Nevertheless, it can be obtained by computing a topological sort on the included user precedences and the catalogue precedences defined over the included features.
In order to remove the symmetries the wcsp formulation, as described in Section 8.2, can be augmented.One way could be to associate costs with the values (greater than 0) of the position variables in such a way that there is a unique assignment of values to the variables, which is optimal for a given strict partial order.Our preliminary investigation suggested that the number of nodes were reduced but at the expense of increasing the time.In our current setting, the wcsp approach has been used as a black box.Indeed, certain improvements can be made which may improve the performance in terms of time.For example, stronger soft consistency techniques can be applied similar to the singleton arc consistency for the cop model, which is more efficient for feature subscription problem.
We also investigated the impact of using the global constraint SoftPrec.This global constraint was implemented in choco.The results obtained by using it are denoted by sp.Five variants of SoftPrec have been investigated by Lesaint et al. (2009).The results presented in this paper correspond to the variant that was observed to be the best in terms of time, which Lesaint et al. (2009) denoted by sp 4 .The results in Tables 6-8 show that SoftPrec always outperforms mrsac b on average.However, Lesaint et al. (2008a) theoretically showed that the pruning achieved by maintaining rsac on the Boolean 7. http://carlit.toulouse.inra.fr/cgi-bin/awki.cgi/ToolBarIntrovariables of the cop model and ac on the remaining variables is incomparable with the pruning achieved by using SoftPrec.

Evaluation of the Boolean Satisfiability Formulations
The evaluation of the atom-based pwmsat encoding of feature subscription was carried out on three different solvers: (a) sat4j 8 (version 2.1.1),an efficient library of sat solvers in Java that implements the minisat specification (Eén & Sörensson, 2003); (b) minisat+ 9 (version 1.13+), a pseudo-Boolean solver implemented on top of minisat (Eén & Sörensson, 2006); and (c) clasp 10 (version 1.3.0),an answer set solver that supports Boolean constraint solving (Gebser, Kaufmann, & Schaub, 2009).As the two last solvers are pseudo-Boolean solvers, the pwmsat instances were translated into linear pseudo-Boolean instances by associating each clause with a linear pseudo-Boolean constraint, and defining the objective function as the weighted sum of the soft clauses in the pwmsat model (de Givry, Larrosa, Meseguer, & Schiex, 2003).
The results of the evaluation are summarized in Table 4.We remark that the results for the sat4j solver, especially for the dense catalogues, are roughly 10 times faster in terms of time when compared to those presented by Lesaint et al. (2008c).This is simply due to the advances in the version of the sat4j that has been used to obtain the results.Despite that, sat4j is significantly outperformed by both minisat+ and clasp.We observed up to a one order-of-magnitude gap in those cases where the catalogue is sparse.clasp and minisat+ seem to be incomparable in our instances.Even though clasp performed better on our toughest category of instances 45, 90 , clasp spent 27% more time solving the whole set of instances.We also noticed that clasp seems to be more sensitive to the number of features in sparse instances.While we observed a gap of one order-of-magnitude between categories 45, 90 and 50, 4 in the 50, 250, {≺, } catalogue with sat4j and minisat+, the gap observed with clasp was not that significant.We now compare the atom-based encoding with the symbol-based unary and binary encodings as described in Section 9.2.In order to do a fair comparison between these encodings we need to solve the same instances of feature subscription on the same machine using the same solver.As we did not have access to the instances of feature subscription for se u and se b encodings, we use the results of the experiments run by Daniel Le Berre 11 for all the three encodings: ae, se u and se b on the same instances of feature subscription using sat4j solver (version 2.1.0)on a pc pentium 4 (cpu 3 ghz).Codish et al. (2009) have also made these results public.
Table 5 presents results for feature subscriptions of different sizes of different catalogues for three encodings: ae, se u , and se b .The experimental results show that ae is, in general, more efficient than se b , which is consistent with the fact that unit propagation on ae is strictly stronger than unit propagation on se b .Note that ae is up to two orders-ofmagnitude faster than se b .Notice that se b never outperforms both se u and ae on any class of feature subscription.Although the results reported in Tables 1, 2 and 3 of the works of Codish et al. (2008Codish et al. ( , 2009) ) suggest that se b is much better than ae, the results shown in Table 5 contradict this conclusion.The results obtained by using se b are significantly outperformed by those obtained by using ae.This apparent conflict could be for one of several reasons.The results reported by Codish et al. (2008) were based on different instances for different encodings and the instances used for the symbol-based encoding were very much easier and in fact some large size instances with 50 features were already consistent.Also, the experiments for different encodings were conducted on different machines.Codish et al. (2008Codish et al. ( , 2009) ) obtained the results for the symbol-based encoding and the atom-based encodings using different solvers.The experiments for se b were done using a solver, which has been implemented on top of minisat, while for ae the results were obtained using the sat4j solver.It is apparent from Table 4 that the use of different solvers can make a huge difference in terms of runtime.In fact, we have observed a huge improvement for ae when tested with the minisat+ solver.This latter fact suggests that the speed up observed by Codish et al. (2008Codish et al. ( , 2009) ) could be mostly because of the use of minisat.Also, notice that the results depicted in Table 5 are in accordance with the fact that unit propagation in the atom-based encoding is strictly stronger than unit propagation in the symbol-based binary encoding.
Although unit propagation on ae encoding is equivalent to unit propagation on se u encoding when assignments are restricted to problem variables, empirically it is not always possible to observe this due to the exploration of the search trees in different orders.Table 5 shows that ae and se u are incomparable in terms of time.Therefore, it is not possible to conclude superiority of any of the two approaches.We have also been informed that the instances of the symbol-based encodings also include the computation of the objective function, and the comparison of the value of the objective function with an upper bound as described by Codish et al. (2009).However, they are not needed when applying the pwmsat solver of sat4j.These extra clauses may indeed prevent the symbol-based approaches to perform at their best.Nevertheless, most of the clauses of the symbol-based encodings are coming from the encoding of the precedence constraints.
Finding an optimal relaxation of a feature subscription using a sat solver can be decomposed into three tasks: (a) the encoding of the strict partial order, (b) the encoding of the objective function, and (c) the underlying search algorithm of the sat solver.Improving any of these tasks can improve the whole approach for solving the problem.In this paper we have focused on task (a), which is mainly about the encoding of the precedence constraints.We remark that (a), (b) and (c) are orthogonal tasks, so any of the techniques for tasks (b) and (c) can certainly be used with any of the techniques for task (a).The different encodings of precedence constraints can be fairly compared when the same (or the best suited) techniques of tasks (b) and (c) are used.Codish et al. (2008Codish et al. ( , 2009) ) propose several techniques for (b) and (c), e.g., the encoding of the sum constraint and the use of dichotomic search for the optimisation aspect.It may be possible to improve the results of atom-based encoding further by using these techniques.

Comparison between CP, SAT and MIP-based approaches
The performances of using constraint programming (cp), partial weighted maximum satisfiability (sat) and mixed integer linear programming (mip) approaches are presented in Tables 6, 7 and 8.The mip model of the problem was solved using ilog cplex12 (version 10.1).For the cp approaches the results are presented for mrsac b and the global constraint denoted by sp.For the sat approaches we use the results obtained by using clasp and minisat+.All the approaches solved all the instances within the time limit.Since in general finding an optimal relaxation is NP-hard, we need to investigate which approach can do it in reasonable time.The best approach in terms of time is represented in bold letters for each class of feature subscription.The results presented in Table 6 suggest that the mip approach performs better than the cp and sat approaches for the hardest feature subscription instances of the sparse catalogue 50, 250, {≺, } , in particular for 45, 90 and 50, 4 classes of feature subscriptions, and for the remaining classes of feature subscription of the catalogue 50, 250, {≺, } , the sat approach based on the clasp solver is the winner.For the dense catalogue 50, 750, {≺, } , the mip approach is significantly slower than the other approaches.Notice that the results for the mip approach have improved significantly when compared with the results presented by Lesaint et al. (2008c).This is because of the usage of real-valued variables for the positions of features.The results presented in Tables 7 and 8 for the catalogues 50, 500, {≺, , ≺ } and 50, 750, {≺, } , respectively, suggest that the sat approaches perform significantly better than the mip and cp approaches.In particular, the sat approach based on the clasp solver is the winner for all the classes except for the 50, 4 class of feature subscription of catalogue 50, 750, {≺, } , where it is outperformed by the cp approach based on the global constraint and the sat approach based on minisat+.
Even though mrsac b and SoftPrec are outperformed by at least one of the other approaches in all the cases, they are never the worst with respect to the total time required for solving all the instances as shown in Figure 3.In particular the cp approach based on SoftPrec is very competitive in those cases where the catalog is dense.Figure 3 also shows that the pseudo-Boolean solvers clasp and minisat+ perform better in terms of total time when compared with the other approaches.It should be noted that clasp and minisat+ are implemented in C++ and use restarts, while mrsac b and SoftPrec are implemented in the Java-based choco solver and they do not use restarts.Both clasp and minisat+ perform poorly when compared with respect to the number of nodes visited during search.This shows that the time spent by clasp and minisat+ at each node is considerably less than the time spent by the remaining approaches.There is of course the opportunity to improve the per-node speed of the cp approaches by implementing them in a C++ based solver.We also remark that both clasp and minisat+ consume more memory than the cp-based approaches and the mip approach.To illustrate this, we also computed the sum of the problem sizes of all the instances for all the approaches.Here, the problem size of an instance is the sum of the number of variables, the domain sizes of all the variables, and the arity of all the constraints.Figure 4 depicts the plot for the total problem size for each approach.The total problem size for clasp and minisat+ is roughly two orders-of-magnitude more than the other approaches.We, therefore, conclude that clasp and minisat+ do not offer scalability.

Conclusions and Future Work
In this paper we have focussed on the task of finding an optimal relaxation of feature subscription when the user's preferences violate the technical constraints defined by a set of distributed feature composition rules.We reformulated the problem of finding an optimal relaxation, and showed that it is a generalisation of the Feedback Vertex Set problem, which makes the problem NP-hard.We developed cpbased methods for finding an optimal relaxation of feature subscription.In particular we presented three models: a basic constraint optimisation problem model, a model based on a global constraint, and a weighted csp model.For the basic cop model, we studied the effect of maintaining arc consistency and two mixed consistencies during branch and bound search.Our experimental results suggest that maintaining (restricted) singleton arc consistency on the Boolean variables and arc consistency on the integer variables outperforms mac significantly.The former approach was outperformed empirically by the cp approach based on the SoftPrec global constraint.
We also compared the cpbased approaches with the sat-based approaches and a mixed integer linear programming approach.In the partial weighted maximum satisfiability case we presented an atom-based encoding and investigated two symbol-based encodings.When the set of assignments are restricted to problem variables unit propagation on the atombased encoding is strictly stronger than the unit propagation on the symbol-based binary encoding, and the former is equivalent to the unit propagation on the symbol-based unary encoding.Empirically, the atom-based encoding is better than the symbol-based binary encoding, and it is incomparable with the symbol-based unary encoding.Overall, the results suggest that when the catalogue is sparse mip is better in terms of runtime on hard instances.When the catalogue is dense the sat approach based on clasp is better in terms of runtime.The sat approach based on minisat+ and the cp approach based on the global constraint are also very competitive on the dense catalogues.Overall, the pseudo-Boolean solvers clasp and minisat+ perform better in terms of total time when compared with the other approaches.
The approaches considered in this paper are mostly one-stage approaches in the sense that the exploration is started without any approximation of the optimum value.In the future we would like to consider a two-stage approach where, at the first stage, a heuristic is used to compute an approximation of the optimal solution, and at the second stage, the exploration is carried out taking the approximate value as an initial lower bound.The cp approach based on wcsp was explored the least.It may be possible to improve its performance by using different models that overcome the problem of symmetric solutions and stronger consistency techniques similar to singleton arc consistency in the case of the basic cop model.In the current settings the performance of all the approaches in terms of time includes the time taken to prove the optimality of the solution.In the future, we would like to compare all the presented approaches and also local search methods in terms of their anytime profiles (i.e.solution qualities over time).It would be interesting to investigate the impact of restarts on all the approaches.

Figure 1 :
Figure 1: An example of an undesirable feature interaction.

Figure 3 :
Figure 3: Total time and nodes required to solve all the instances by different approaches.

Figure 4 :
Figure 4: Total problem size of all the instances for different approaches.
Definition 6 (Corresponding Subscription).Let F s , H s , F t , H t be an original catalogue and F c , H c ≡ F s ∪ F t , H s ∪ H t be its reformulation.Given a feature subscription S o = , H s , F t , H t and a feature subscription S r = F r , H r , P r of the catalogue F c , H c , we say that S r corresponds to S o if the following Proposition 1 (Equivalence of Subscription Consistency).Let F s , H s , F t , H t be an original catalogue and F c , H c ≡ F s ∪ F t , H s ∪ H t be its reformulation.A feature subscription , H s , F t , H t is consistent if and only if the corresponding subscription S r = F r , H r , P r of catalogue F c , H c is consistent.If S r is consistent then there exists a total order T r on F r such that T r ⊇ H r ∪ P r .Therefore there exists some f 1 , . . ., f k , ∈ F r such that for all i = 0, . . ., k, f i , f i+1 ∈ T o s ∪ T o t , where we define s

Table 2 :
Average results of mac, mrsac b and msac b with dom/deg heuristic.

Table 3 :
Average results of mac, mrsac b and msac b with dom/wdeg heuristic.

Table 4 :
Results for the atom-based encoding using different SAT solvers.

Table 5 :
Mean results in terms of time obtained using ae, se u , se b encodings in sat4j.