Skip to main content
Log in

Underestimating the cost of a soft constraint is dangerous: revisiting the edit-distance based soft regular constraint

  • Published:
Journal of Heuristics Aims and scope Submit manuscript

Abstract

Many real-life problems are over-constrained, so that no solution satisfying all their constraints exists. Soft constraints, with costs denoting how much the constraints are violated, are used to solve these problems. We use the edit-distance based SoftRegular constraint as an example to show that a propagation algorithm that sometimes underestimates the cost may guide the search to incorrect (non-optimal) solutions to an over-constrained problem. To compute correctly the cost for the edit-distance based SoftRegular constraint, we present a quadratic-time propagation algorithm based on dynamic programming and a proof of its correctness. We also give an improved propagation algorithm using an idea of computing the edit distance between two strings, which may also be applied to other constraints with propagators based on dynamic programming. The asymptotic time complexity of our improved propagator is always at least as good as the one of our quadratic-time propagator, but significantly better when the edit distance is small. Our propagators achieve domain consistency on the problem variables and bounds consistency on the cost variable. Our method can also be adapted for the violation measure of the edit-distance based Regular constraint for constraint-based local search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Apt, K.: Principles of Constraint Programming. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  • Beldiceanu, N., Carlsson, M., Petit, T.: Deriving filtering algorithms from constraint checkers. In: Wallace, M. (ed.) Proceedings of CP’04, volume 3258 of LNCS, pp. 107–122. Springer, Berlin (2004)

  • Beldiceanu, N., Carlsson, M., Flener, P., Pearson, J.: On matrices, automata, and double counting in constraint programming. Constraints 18(1), 108–140 (2013)

    Article  MathSciNet  Google Scholar 

  • He, J., Flener, P., Pearson, J.: An automaton constraint for local search. Fundam. Inform. 107(2–3), 223–248 (2011)

    MathSciNet  MATH  Google Scholar 

  • Kadioǧlu, S., Sellmann, M.: Grammar constraints. Constraints 15(1), 117–144 (2010)

    Google Scholar 

  • Katsirelos, G., Narodytska, N., Walsh, T.: The weighted CFG constraint. In: Perron, L., Trick, M. (eds.) Proceedings of CP-AI-OR’08, volume 5015 of LNCS, pp. 323–327. Springer, Berlin (2008)

  • Katsirelos, G., Maneth, S., Narodytska, N., Walsh, T.: Restricted global grammar constraints. In: Gent, I.P. (ed.) Proceedings of CP’09, volume 5732 of LNCS, pp. 501–508. Springer, Berlin (2009)

  • Katsirelos, G., Narodytska, N., Walsh, T.: The weighted grammar constraint. Ann. Oper. Res. 184, 179–207 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Menana, J., Demassey, S.: Sequencing and counting with the multicost-regular constraint. In: van Hoeve, W.-J., Hooker, J.N. (eds.) Proceedings of CP-AI-OR’09, volume 5547 of LNCS, pp. 178–192. Springer, Berlin (2009)

  • Pesant, G.: A regular language membership constraint for finite sequences of variables. In: Wallace, M. (ed.) Proceedings of CP’04, volume 3258 of LNCS, pp. 482–495. Springer, Berlin (2004)

  • Pralong, B.: Implémentation de la contrainte Regular en Comet. Master’s Thesis, École Polytechnique de Montréal, Canada (2007)

  • Quimper, C.-G., Walsh, T.: Global grammar constraints. In: Benhamou, F. (ed.) Proceedings of CP’06, volume 4204 of LNCS, pp. 751–755. Springer, Berlin (2006)

  • Ukkonen, E.: Algorithms for approximate string matching. Inf. Control 64(1–3), 100–118 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  • Van Hentenryck, P., Michel, L.: Constraint-based Local Search. MIT Press, Cambridge (2005)

    Google Scholar 

  • van Hoeve, W.-J., Pesant, G., Rousseau, L.-M.: On global warming (Softening global constraints). In: Proceedings of the 6th International Workshop on Preferences and Soft Constraints, available at http://www.andrew.cmu.edu/user/vanhoeve/papers/softglob.pdf (2004). Accessed 16 May 2013

  • van Hoeve, W.-J., Pesant, G., Rousseau, L.-M.: On global warming: flow-based soft global constraints. J. Heuristics 12(4–5), 347–373 (2006)

    Article  MATH  Google Scholar 

  • Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21, 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  • Zanarini, A., Milano, M., Pesant, G.: Improved algorithm for the soft global cardinality constraint. In: Beck, J.C., Smith, B. (eds.) Proceedings of CP-AI-OR’06, volume 3990 of LNCS, pp. 288–299. Springer, Berlin (2006)

Download references

Acknowledgments

The authors are supported by Grants 2007-6445 and 2011-6133 of the Swedish Research Council (VR), and Jun He is also supported by Grant 2008-611010 of China Scholarship Council and the National University of Defence Technology of China. Many thanks to George Katsirelos for some useful discussions on Sect. 6.2, to Louis-Martin Rousseau for some useful discussions on Sect. 4, and to the anonymous referees of this paper for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun He.

Appendix A: An example of encoding the SoftRegular constraint incorrectly

Appendix A: An example of encoding the SoftRegular constraint incorrectly

Given an edit-distance based SoftRegular  \((X,M,z)\) constraint with \(M=\langle Q,\Sigma ,\delta ,\) \(q_0,F \rangle \) and \(\left|X \right|=n\), we give an example to show that the CYK-based propagator for the WeightedGrammar constraint of (Katsirelos et al. 2008, 2011) with a weighted grammar obtained from \(M\) may underestimate the edit-distance based cost measure, as the grammar accepts words of the whole regular language instead of the \(n\)-letter regular language of \(M\). Hence we cannot use the weighted grammar obtained from \(M\) when encoding the SoftRegular  \((X,M,z)\) constraint with the WeightedGrammar constraint.

The DFA \(M\) in Fig. 1 can be converted into the following context-free grammar (CFG) \(\text{ G }_1\) by encoding every transition of \(M\) into a linear production, where O is the start symbol:

$$\begin{aligned} \text{ G }_{1}:&\text{ O } \rightarrow \text{ dD } |\text{ eE }| \varepsilon \\&\text{ D } \rightarrow \text{ dD } |\text{ vO } \\&\text{ E } \rightarrow \text{ vO } \end{aligned}$$

The CFG \(\text{ G }_1\) can be converted into the following Chomsky normal form (CNF) \(\text{ G }_2\):

$$\begin{aligned} \text{ G }_2:&\text{ O } \rightarrow \text{ Y }_\text{ d }\text{ D } | \text{ Y }_\text{ e }\text{ E } \\&\text{ D } \rightarrow \text{ Y }_\text{ d }\text{ D } | \text{ Y }_\text{ v }\text{ O } | \text{ v } \\&\text{ E } \rightarrow \text{ Y }_\text{ v }\text{ O } | \text{ v } \\&\text{ Y }_\text{ d } \rightarrow \text{ d } \\&\text{ Y }_\text{ e } \rightarrow \text{ e } \\&\text{ Y }_\text{ v } \rightarrow \text{ v } \end{aligned}$$

The WeightedGrammar constraint can be used to encode the edit-distance based SoftGrammar constraint (Katsirelos et al. 2008, 2011). Given the CNF \(\text{ G }_2\), the following weighted productions will be added to simulate substitution, insertion, and deletion operations:

$$\begin{aligned} \text{ substitution } \text{ productions }:&\text{ Y }_\text{ d } \rightarrow \text{ e } | \text{ v }, \text{ with } \text{ weight } \text{1 } \\&\text{ Y }_\text{ e } \rightarrow \text{ d } | \text{ v }, \text{ with } \text{ weight } \text{1 } \\&\text{ Y }_\text{ v } \rightarrow \text{ d } | \text{ e }, \text{ with } \text{ weight } \text{1 } \\ \text{ insertion } \text{ productions }:&\text{ Y }_\text{ d } \rightarrow \varepsilon , \text{ with } \text{ weight } \text{1 } \\&\text{ Y }_\text{ e } \rightarrow \varepsilon , \text{ with } \text{ weight } \text{1 } \\&\text{ Y }_\text{ v } \rightarrow \varepsilon , \text{ with } \text{ weight } \text{1 } \\&\text{ D } \rightarrow \varepsilon , \text{ with } \text{ weight } \text{1 } \\&\text{ E } \rightarrow \varepsilon , \text{ with } \text{ weight } \text{1 } \\ \text{ deletion } \text{ productions }:&\text{ O } \rightarrow \text{ HO } | \text{ OH }, \text{ with } \text{ weight } \text{0 } \\&\text{ D } \rightarrow \text{ HD } | \text{ DH }, \text{ with } \text{ weight } \text{0 } \\&\text{ E } \rightarrow \text{ HE } | \text{ EH }, \text{ with } \text{ weight } \text{0 } \\&\text{ Y }_\text{ d } \rightarrow \text{ HY }_\text{ d } | \text{ Y }_\text{ d }\text{ H }, \text{ with } \text{ weight } \text{0 } \\&\text{ Y }_\text{ e } \rightarrow \text{ HY }_\text{ e } | \text{ Y }_\text{ e }\text{ H }, \text{ with } \text{ weight } \text{0 } \\&\text{ Y }_\text{ v } \rightarrow \text{ HY }_\text{ v } | \text{ Y }_\text{ v }\text{ H }, \text{ with } \text{ weight } \text{0 } \\&\text{ H } \rightarrow \text{ d } | \text{ e } |\text{ v }, \text{ with } \text{ weight } \text{1 } \end{aligned}$$

Consider the SoftRegular  \((X,M,z)\) constraint, where \(X=\langle x_1,\ldots ,x_5 \rangle \) is a sequence of \(\left|X \right|=5\) decision variables, with current domains \(\text{ dom }(x_1) = \text{ dom }(x_3) = \{\text{ e }\},\, \text{ dom }(x_2) = \{\text{ v }\}, \,\text{ dom }(x_4) = \{\text{ d, } \text{ v }\}\), and \(\text{ dom }(x_5) = \{\text{ d }\}\), and \(M\) is the DFA of Fig. 1. The minimum edit distance between all feasible assignments (namely \(\left\{ \text{ evedd, } \text{ evevd }\right\} \)) and the 5-letter regular language accepted by \(M\) (namely \(\{\text{ ddddv, } \text{ ddvdv, } \text{ ddvev, } \text{ dvddv, } \text{ evddv }\}\)) is 2. However, as shown in Fig. 9, the minimum weight computed by the CYK-based propagator of (Katsirelos et al. 2008, 2011) with the obtained weighted grammar is 1 (instead of 2), which is the same as the one computed in (van Hoeve et al. 2004, 2006) measuring the edit distance from word evevd to evev (in the 4-letter regular language accepted by \(M\)) through one deletion operation, hence the CYK-based propagator with the unsuitable weighted grammar underestimates the cost measure in this case.

Fig. 9
figure 9

A minimum-weight parse tree computed by the CYK-based propagator of (Katsirelos et al. 2011)

Actually, in order to make the CYK-based propagator for the WeightedGrammar constraint of (Katsirelos et al. 2008, 2011) work properly for the edit-distance based SoftGrammar constraint, we claim that two more changes are needed in addition to the one mentioned for this purpose on page 200 of (Katsirelos et al. 2011), which changes a loop control variable in order to handle \(\varepsilon \) productions.

  1. 1.

    Unit-weight \(\varepsilon \) productions are introduced to simulate insertion operations. Here, \(\varepsilon \) production means a production that generates \(\varepsilon \). In order to handle these \(\varepsilon \) productions, the CYK-based propagator allows a symbol generated from another symbol in the same cell. For example, in Fig. 10, there are 2 symbols, C and E, in the cell \((i=1,j=0)\) generated from the two insertion productions \(\text{ C }\rightarrow \varepsilon \) and \(\text{ E }\rightarrow \varepsilon \). Note that the example of Fig. 10 has no relation to our running example of Sect. 4. In the cell \((i=1,j=2)\), there are three symbols O, D, and A generated from the three productions \(\text{ O }\rightarrow \text{ CD }, \,\text{ D }\rightarrow \text{ EA }\), and \(\text{ A }\rightarrow \text{ BC }\) respectively. When the CYK-based propagator computes the lower (or upper) bounds, it is crucial that these three productions are explored in a correct order: first \(\text{ A }\rightarrow \text{ BC }\), then \(\text{ D }\rightarrow \text{ EA }\), and finally \(\text{ O }\rightarrow \text{ CD }\) (or in the opposite order), so that the lower (or upper) bounds of O, D, and A in cell \((i=1,j=2)\) are computed correctly. Hence all symbols in each cell must be sorted before computing the bounds.

  2. 2.

    Line 73 of the CYK-based propagator on page 191 of (Katsirelos et al. 2011), where the domains of the decision variables are pruned, also needs to be modified to suit the case of substitution and deletion productions, so that the domain of the decision variable \(x_i\) should not be pruned if there exists a symbol with an upper bound of \(1\) in cell \((i,1)\) denoting a substitution or deletion production.

Fig. 10
figure 10

An example for the CYK-based propagator of (Katsirelos et al. 2011) with insertion productions that generate \(\varepsilon \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, J., Flener, P. & Pearson, J. Underestimating the cost of a soft constraint is dangerous: revisiting the edit-distance based soft regular constraint. J Heuristics 19, 729–756 (2013). https://doi.org/10.1007/s10732-013-9222-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10732-013-9222-1

Keywords

Navigation