Constraints

10 February 2010

P=NP consequences

Filed under: Question — András Salamon @ 1:30

The consequences of P = NP would be “potentially stunning”, according to Stephen Cook, in his Millennium Prize overview of the P versus NP problem.

Is this really true?

Richard Lipton has argued eloquently that we should consider the possibility that P may equal NP. In this spirit, I want to briefly discuss what P = NP would imply.

As is usually noted in a discussion of P = NP consequences, one cannot simply say “if P = NP then wondrous things happen”. What is needed before we can conclude anything “potentially stunning” is an additional hypothesis along the lines of “there is a practical algorithm for solving an NP-complete problem”, by which Cook appears to mean “we can solve some NP-complete problem in time O(n^k), for smallish k“. For instance, an algorithm that somehow exploited the monster group to solve an NP-complete problem, in O(n^{2^{180}}) time, would probably not have practical applications. So the additional condition is important.

Yet, even a low-degree polynomial bound for an algorithm for an NP-complete problem would not be enough for any special implications for most problems in P. If an algorithm existed to decide TSP in time O(n^9), then it might be possible to improve the way courier companies allocated parcels to drivers. At the same time, factoring might require \Omega(n^{1024}) time, but an algorithm achieving this bound would mean little for cryptography.

Many popular accounts of P versus NP seem to be based on a belief that a fast algorithm for some NP-complete problem would let us solve any problem in NP quickly. But this is not true.

Denote by Q_r some problem in P which is in DTIME(n^{r+1}) but not in DTIME(n^r). The Time Hierarchy Theorem guarantees the existence of Q_r for each positive r. This is true regardless of the status of P versus NP.

So there are problems in P that are not “easy”, even if SAT could be solved in linear time with a small constant. If a problem is not in DTIME(n^r) then for any deterministic Turing machine deciding the problem, and any constant C, infinitely many instances of the problem require the machine to take more than Cn^r steps. I think it is reasonable to say that in this case, the problem is not “easy”.

So far the consequences of P = NP do not seem all that impressive, but there is one more thing worth noting.

Many of the known reductions from SAT to NP-complete problems are quite efficient. This means that if one of these NP-complete problems could be solved by an algorithm that requires at most O(n^k) time, then most NP-complete problems would also have time complexity “close” to O(n^k). So if P = NP then the asymptotically hardest problems in P (and NP) would be the Q_r problems. These are not necessarily NP-complete, and they certainly would not look like most of the usual NP-complete problems.

So we need to add another condition to the list: if P = NP, and there is a fast algorithm for some NP-complete problem, and the hard problems in P are of no practical interest, then some potentially stunning things would be true.

I think this third hypothesis is quite interesting.

Reductions to NP-complete problems are all via SAT. The universal reduction in the Cook-Levin Theorem turns an instance of a problem Q in NP into an instance of SAT. If the instance of Q is of size n, and the polynomial upper bound on the time required for a multi-tape non-deterministic Turing machine to decide Q is p(n), then the size of the SAT instance is O(p(n) \log p(n)) (see Stephen Cook’s proof of this).

This means that the key issue (in a P = NP world) is how efficiently a language can be recognized by an NDTM. Most of the commonly studied problems in NP have certificates that can be verified in linear time. So these problems are close to SAT, and are not going to be the Q_r languages.

The important question is then:

Question: In NP, which are the languages with hard-to-verify certificates?

If P is not equal to NP, then there are problems in NP that are asymptotically harder than any problem in P. So the efficiency of reductions to SAT is not as important in this case. But even independently of P versus NP, it would be worth looking for natural candidates for problems Q_r for each r, as well as problems Q'_r that might require \Omega(n^{r+\epsilon}) time to solve on an NDTM. Proving non-trivial lower bounds is hard, but at least we could begin with some candidate problems!

6 December 2009

Rossman’s clique result

Filed under: Commentary — András Salamon @ 19:41
Tags:

This is about three of the major results Benjamin Rossman has published to date. The first one was showing that existential positive sentences are preserved by homomorphisms, over finite structures. The second was work with Dawar and Richerby, showing that Choiceless Polynomial Time is enough to separate P from the class expressible by IFP with counting.

I here discuss the third result, from STOC 2008. This gives a superpolynomial lower bound on the size of constant-depth circuits that decide CLIQUE.

Let k be a constant. Rossman’s paper shows that for every l > k, no constant-depth circuits of size O(n^{k/4}) distinguish between the graphs that contain an l-clique and those that contain no k-clique. This gives an \omega(n^{k/4}) lower bound on the size of constant-depth circuits for deciding k-CLIQUE on n-vertex graphs.

The corollary is that the bounded variable fragments of first-order logic over finite ordered graphs form a strict hierarchy. It was previously not known if even FO^3 was less expressive than FO, on finite ordered graphs.

A logic over ordered structures has the built-in relation symbol < between elements; the paper works with such logics.

Finite ordered graphs are important because for actual computation, we typically order the vertices, then compute over the resulting ordered structure. So there are many possible realisations of the same underlying unordered graph, one for each permutation of the vertices. The ordering of the vertices carries additional information, and this often does have a big impact in implementations. It then seems a priori not out the question that the 3-variable fragment of FO over finite ordered graphs could be as expressive as FO.

Note that the class of graphs can be turned into a set by taking each canonical n-vertex graph in the equivalence class up to isomorphism as the one defined over vertices \{1,2,\ldots,n\}.

Now for the technical ingredient.

By “size-n graph property” I mean a function from the set of n-vertex graphs to \{0,1\}. Examples of size-n graph properties are “is a connected graph”, “contains a (\log n)-clique”, and “can be turned into a Hamiltonian cycle by adding or deleting at most k edges”.

Let f_n be a sequence of graph properties which can be represented as
constant depth circuits of size O(n^t) for constant t > 1/2. Let k (a positive integer) and \alpha (positive but less than 1/(2t-1)) be constants. For every n, let G_n be an Erdős-Rényi random graph with n vertices and edge probability n^{-\alpha}, let A_n be a uniform random set of k vertices of G_n (i.e. chosen uniformly from the k-element subsets of \{1,2,\ldots,n\}), and let H_n be the graph formed by adding all edges between the vertices in A_n to G_n.

Then f_n(G_n) = f_n(H_n) asymptotically almost surely.

My paraphrase: for almost all graphs from the “right” range of edge densities, constant depth circuits of at most polynomial size cannot distinguish between general instances and those guaranteed to have a k-clique.

Note that Lemma 2.1 ensures that when the edge density is in the “right” range, then there will tend to be few cliques. Hence forcing a k-clique will in the limit be a non-null operation.

Note that “has a k-clique” is a monadic second-order property, but zero-one laws do not hold for many quite weak fragments of MSO. Also, as noted by Matula in 1972, the size of the largest clique in G_n is almost surely O(\log n) if \alpha is not fixed but instead varies so that n^{-\alpha} is constant. As n grows, while k is fixed, G_n is almost surely going to have a k-clique with this model. Hence the choice of the random graph model is quite important.

The remark just before Lemma 2.1 mentions that the expectation of the
number of k-cliques in G_n is greater than 1. So in many cases there will already be a k-clique in G_n before another one is added. Much of the paper, including the machinery on s-bounded clique-sensitive
cores (which is explained further in a note), then seems devoted to addressing this difficulty. I have not taken the time to fully grasp how the technical lemmas work together to resolve the niggling doubt one may have at this point, that a circuit designed for deciding the presence of a k-clique should not necessarily be expected to distinguish between a graph that quite likely already has a k-clique, from a graph that has had one added on.

In the meantime, I look forward to a journal version with (hopefully) a simpler proof.

Quite recently Rossman has written a paper which extends this result from constant-depth to monotone circuits. This is the first average-case lower bound result for this framework. An upper bound is also presented, based on Amano’s upper bound for constant depth circuits (submitted version).

30 May 2009

Three results

Filed under: Commentary — András Salamon @ 16:36
Tags:

Three fascinating results that appear to have no connection other than being rather surprising.

First, the chromatic number of the plane. The number of colours required for colouring the real plane, where no two points unit distance apart may be the same colour, is known to be either 4, 5, 6, or 7. Shelah and Soifer noted in 2003 that the actual value relates to the proposed axiom LM of set theory that requires that every set of reals is Lebesgue measurable. Specifically, the chromatic number of the plane is 4 with the usual ZF axioms of set theory, augmented with the Axiom of Choice. This was shown by Erdős and de Bruijn in 1951. However, the chromatic number of the plane is at least 5 in ZF together with LM and a weaker version of AOC, the Principle of Dependent Choices that Bernays proposed in 1942. So the Axiom of Choice does impact the real world, after all, at least for some definition of “real world”!

The second result is a problem that only requires a poly-logarithmic number of nondeterministic steps for a polynomial-time solution, and which has a deterministic runtime upper bound that is almost, but not quite, polynomial. A hitting set X of a hypergraph H (also known as a transversal of H) intersects each edge of H. A minimal transversal contains no strict subset that is a transversal. The hypergraph of all minimal transversals of a hypergraph H is its transversal hypergraph. TRANS-HYP is the problem of deciding, given two hypergraphs H and H’, whether H’ is the transversal hypergraph of H. TRANS-HYP is in co-NP but not obviously in NP. Fredman and Khachiyan showed that it is possible to recognize the transversal hypergraph of a given H in time O(n^{\log n}), where n is the size of the input. (This result was actually expressed in terms of recognition of duals of monotone Boolean functions.) The surprising result is that it is possible to show that a given hypergraph is not a transversal hypergraph of another in polynomial time using only O(\log^2 n/\log \log n) nondeterministic steps. This was shown by Eiter, Gottlob, Makino and independently by Kavvadias and Stavropoulos, in 2002-3. If one is allowed a polynomial number of nondeterministic steps then of course one can capture NP, while a logarithmic number of nondeterministic steps captures P. This important problem therefore occupies a surprising place in the bounded nondeterminism hierarchy (which was last surveyed by Goldsmith, Levy, Mundhenk in 1997). A succinct overview of the state of the art is the paper by Eiter, Makino, Gottlob published in 2008.

Finally, an upcoming paper. Michael Mitzenmacher has briefly discussed the STOC 2009 programme; following on, I would recommend the recent paper by Moser and Tardos; it presents a beautiful and short algorithmic proof of the Lovász Local Lemma in a very general form, and appears to supersede the (excellent) paper of Moser at STOC. But back to the paper I actually wanted to highlight: in among the STOC 2009 abstracts there was the intriguing comment in the abstract of a paper by Kawarabayashi and Reed discussing their algorithm deciding the Hadwiger Conjecture. For the special case k=5 this algorithm apparently implies the Four-Colour Theorem. Since the abstract seems to indicate that the detailed case analysis of the previous constructive proofs is largely avoided, I await the full paper with great anticipation. (Edit: the full paper has corrected the sentence. The result seems to be an important step forward in proving the Hadwiger Conjecture, but it doesn’t supply a short proof for the 4CT, alas.)

3 May 2009

Parallel slowdown

Filed under: Commentary — András Salamon @ 19:15
Tags:

This is a quick high-level tour of my recent paper with Vashti Galpin, Bounds on series-parallel slowdown.

Suppose you are writing some parallel code. You fire up Matlab, and plan to use the Parallel Computing Toolbox to sort out the tedious details of scheduling. But it doesn’t run nearly as fast as you were hoping. What went wrong?

There are many ways for parallel computations to fail to achieve their potential. Ideally, running a computation on m processors should take only 1/m times as long as on one processor, a speedup of m times. At the very least, one would hope for \Theta(m) speedup. For problems involving searching, starting the search in many different places can even make the speedup superlinear; so randomized restarts after a timeout can be a good strategy for such problems. But usually various delays are introduced in the parallel implementation, compared to the idealised computation that one would wish to see, so there is a slowdown.

The first attempt to explain slowdown was Amdahl’s Law: if a fraction of the computation is inherently not parallel, then as the rest of the computation disappears to zero with the addition of more and more processors, this non-parallel fraction will dominate. So if 1% is inherently not parallel, then even if the rest takes no time at all, the computation can never achieve speedup of more than 100. The figure illustrates how the speedup S is limited by both the number of processors m as well as the fraction f of the program that is inherently not parallel.
Amdahl's Law
In practice, it is not clear how to find the magic fraction — what exactly does it mean for part of the code to be inherently not parallelisable?

Yet it turns out that adding just a little more detail to our model of parallel programs allows some progress. Let slowdown be the ratio of the time to run the program to the time one would hope it would take.

To keep things really simple, consider a particular instance of a computation, corresponding to the instructions and state of a parallel program while it runs. Group sequences of instructions into activities in some way, each having some duration. Then some parts of the computation will depend on the results produced by other parts, creating precedence constraints between activities: some activities will need to complete before other activities can start.
Now even if there are no delays at all due to communication, cache misses, resource contention, or any of the other fun things that make parallel systems so interesting, there will still be some inherent slowdown.

On the one hand, the activities could be represented directly, and we could tell the system about all the precedence constraints between them, letting it work things out.

But this is not what actually happens. Creating a synchronisation object (as in OpenCL) for every precedence constraint is tedious and adds a lot of overhead. Usually one instead uses a toolkit of parallel operations provided by the system. These involve creating and managing threads, sending and waiting for messages, or most commonly, operations like Matlab’s parfor which flags activities that all may run in parallel.

Our representation of the parallel computation tells the system about our choice of activities and the precedence constraints between them. Each particular representation has syntactic limitations, which may implicitly introduce additional precedence constraints between activities.

We consider the question:

How much slowdown can we attribute to being restricted to only series-parallel language constructs?

Neighbour-synchronised activities
Let’s look at an example. Suppose the activities are neighbour synchronised, as illustrated by the black arrows in the second figure. This structure is common in pipelines, or with finite element method calculations. The red arrows indicate precedence constraints added implicitly when we represent this computation in Matlab in the obvious way:
for i = 1 : t
parfor j = 1 : N
% compute activity a_{i,j}
end
end

Even if everything else is ideal, the additional precedence constraints will cause slowdown. The combination of the for and parfor constructs turns the activity network into a series-parallel activity network, adding extra constraints implicitly. Many of the standard constructs to express parallelism implicitly series-parallelise the computation.

In the paper, we show that the slowdown can be arbitrarily large in the worst case, for any choice of how to series-parallelise the computation! If we choose the representation before the computation actually happens, and that representation forces the activity network to be of series-parallel form, then all the stochastic effects in the parallel system may interact badly with the particular choices made.

We also show that if one postpones the choice of which series-parallelisation to use until the activity durations are known, then a slowdown of at most 4/3 should be possible. This remains a conjecture, although one that has been proved for small activity networks.

Finally, actually achieving a series-parallelisation with 4/3 slowdown seems to be a difficult problem.

What does this mean? If the system is not prone to stochastic effects that may cause some activities to take much longer than we expect, then all we have to do is to ensure all activities are similar in duration. We show that this can then be handled very easily, and the slowdown is bounded above by the ratio of the largest to the smallest activity duration. But if this is not possible to achieve, then we need to express our computation in a better way.

I propose something like
neisyn d w g f(d,w,g) where d and w are iterators, g is an optional parameter describing the maximum out-degree of the network of constraints, and f is a block of code that depends on g and the current value of d and w. As a first attempt to implement neisyn, it could be translated into the for/parfor combination above (or its equivalent), which is most likely how the code would have been written anyway if those were the only constructs available. If a better implementation of neisyn were to replace this simple hack, the code would then not need to change.

A construct such as neisyn is straightforward to use. To implement it efficiently probably requires highly non-trivial work behind the scenes, for instance an efficient dynamic scheduler. Yet there is clear value in trying this approach. Imagine the computation of a_{i,j} taking a very long time for some reason. With a particular choice of series-parallelisation, every activity that has been requested to depend on a_{i,j} will have to wait for it to finish. Instead, neisyn allows the computation to proceed as a wave front across the activities, achieving better performance overall than a more rigidly structured computation. Testing this experimentally would be interesting.

9 March 2009

Apple Keyboards: curse of the paragraph key

Filed under: Commentary — András Salamon @ 20:53

I’ve been doing a fair bit of typing on a new Apple wired keyboard.  It’s fairly flat and manages to support quick typing. It reminds me a little of the keyboard that the ZX Spectrum sported (it’s not nearly as spongy, luckily).

However. Let’s rewind.

The UK version of Apple’s keyboard has a nifty pound symbol that I never use (three-letter currency abbreviations are much more useful). It also has to squeeze in a key for the crucially important paragraph/plus-minus key.  This key, possibly mandated by some European Union directive, means that there is a smaller left shift button.  Fine, one could live with all these idiosyncrasies.  But why, oh why, did the backquote/tilde key have to be moved from its top-left position?

Tilde is of course the symbol used to abbreviate $HOME in most shells.  Backquote denotes the position of a tag in vi.  Guess what — I use both the command line and vi all the time.

So I bought the International English keyboard instead.  Great, the dollar sign returns to Shift-4 (try writing some shell scripts or Perl 5 without easily accessible variable sigils sometime), hash is back to Shift-3, and the pound symbol is still available via Opt-3 (just in case).  But, but, BUT! The paragraph key is still there, usurping the backquote/tilde key’s position.  Aaargh!  What exactly is wrong with the usual North American Apple Keyboard layout for English?  Do Mac users outside the USA really require paragraph and plus-minus keys; is this an intimation (or even a slur) that International English is for those corporate minded and also equivocal?

Enough ranting: a reasonable workaround is to use a free utility called Ukelele to generate a custom keymap. I changed the paragraph key to a duplicate backquote/tilde key, solving this particular problem. Great for those whose fingers are hard-wired to use the top-left key for useful stuff, instead of peppering keyboard input with strange symbols. Thanks to the folks at SIL for making this tool available.

STACS 2009

Filed under: Commentary — András Salamon @ 20:24
Tags:

STACS 2009 was held 26 to 28 February 2009 in Freiburg, Germany.  Several papers are relevant to constraints, here are rather biased synopses of two of these.

Andrei A. Bulatov, Víctor Dalmau, Martin Grohe, and Dániel Marx: Enumerating Homomorphisms

Let ECSP(S,T) be the problem instance that requires generating all homomorphisms from S to T.  For classes A and B of relational structures, all with the same finite signature, let ECSP(A,B) be the problem consisting of all instances ECSP(S,T) where S is in A and T is in B.  (In particular, the fixed finite signature means that only constraint satisfaction problems of bounded arity are dealt with here.)  In general ECSP has exponentially many solutions, so of interest are A and B that lead to a procedure to generate all solutions in time that is polynomial in the total size of the input and the output.  The main focus of this paper are conditions that guarantee that solutions can be computed with polynomial delay (WPD), which guarantees polynomial total time.

Lemma 3.3 is interesting: for a fixed core T, and letting C_T be the class of disjoint unions of structures S and T where there is a homomorphism from S to T, the decision problem CSP(_,T) is in PTIME iff the problem of finding a solution SCSP(C_T,_) is in PTIME.  Lemma 3.4 similarly relates SCSP and ECSP: for a class A of relational structures, let A’ be the class of structures constructed from A by adding |A| new independent vertices to each structure; then SCSP(A,_) is in PTIME iff ECSP(A’,_) is in PTIME.  Theorem 4.10 states that if A is a class of structures such that the k-core of each structure in A has tree width at most k, then ECSP(A,_) is solvable WPD.  Let CQE(A,B) be the problem of enumerating all extendible partial mappings from a given subset of the vertices of the given source structure S from A to the given target structure T from B, in the sense that the partial mapping can be extended to a homomorphism from S to T.  For a particular class A whose 2-cores have bounded width, Example 6.2 shows that if CQE(A,_) is solvable WPD, then W[1] = FPT.

Dániel Marx: Tractable Structures for Constraint Satisfaction with Truth Tables

This paper deals with unbounded arity problems which are represented as truth tables.  Most of the work on complexity of constraint satisfaction has focused on listing all tuples in each relational structure in the instance.  A truth table representation is a generalized adjacency matrix: instead of listing a tuple, a 1 represents its presence.  For sparse problems truth tables are therefore very large, but for dense problems they are reasonable.  The main result is Theorem 1.5: assuming a conjecture of the author, CSP(A,_) is tractable iff the class of hypergraphs of structures in A has bounded adaptive width.  Adaptive width is a new measure based on fractional independent sets that (Proposition 5.1) is at most as large as fractional hypertree width.  Previously fractional hypertree width was the most general width measure known that guaranteed tractability.

16 November 2008

Comments on Taleb

Filed under: Commentary — András Salamon @ 23:50

Nassim Nicholas Taleb makes provocative statements with several interesting ideas underpinning them.  To try to get past the over-the-top rhetoric of his interviews and two widely known books Fooled by Randomness and The Black Swan, I spent some time reading the papers at Taleb’s site.

Simplifying vastly, Taleb currently seems to have three main themes.

  1. The convergence to the central limit theorem is too slow to be of any use in processes based on social interaction.  This means that economics (and other social sciences) often cannot use a central limit theorem and work with the limiting stable distributions, but has to grapple with the unknown and often very weird distribution such data has.  In particular, using Gaussian distributions as any kind of approximation is often a very bad idea, even if variance is finite.
  2. The second point is that many social phenomena have self-similar distributions.  Barabási made this idea popular, and it has been widely observed in many kinds of network structures over the last decade.  Taleb argues that most economic time series behave this way also.  This means that their distributions are heavy-tailed, and that it is very difficult to estimate their parameters.
  3. Finally, Taleb notes that many social and economic phenomena are dominated by the effect of single events.  When regulators rejig the rules of a market, or war breaks out, or a meteor hits the earth, the effect is so large that it completely dominates the many years of “normal” behaviour.

This adds up to a pretty pessimistic point of view, on the surface.  However, Taleb appears to be formulating prescriptions for living with this state of affairs.  It seems he believes it is important to bound losses, through mechanisms such as reinsurance or options.  If dragons lurk in the heavy tails, then one can at least reduce their effect by donning some armour.  Taleb hasn’t yet gone far beyond the ranting stage in his published writing, as far as I can see.  It would seem likely that he does have specific prescriptions, since he is one of the principals at Universa Investments, which aims to achieve capital preservation regardless of market conditions.

One way to address the first issue would be to move further along the way towards continuous markets.  Based on the convergence of various distributions, one could estimate the number of events that would have to be observed for some summary event to be close enough to a desired limiting distribution.  There is already a strong move towards markets that are ever less discrete, and doing this analysis would tell us how far we have to go.  Academics in quantitative finance dealing with high frequency data and rough path theory are addressing these kinds of issues.

I’m not sure how to argue with the second issue, unless one can find new types of interaction underlying processes that do not generate self-similar laws.  This might be feasible, but would perhaps require new markets and ways of interacting.  All markets are to some extent designed, so this may be a long term approach, harnessing the work of many people, from different disciplines, on network science.

The issue of single events is probably unavoidable.  If one plans for the meteor strike, then in the interval until it happens life is pretty miserable (what with living underground, keeping a large stock of tinned food, and keeping up a gloomy outlook).  It seems unlikely that human nature could be changed to prepare for the worst, when that could be years or decades away.  The engineering solution (of doubling or trebling tolerances, because we don’t actually know the distributions), or the insurance solution (of keeping a reserve for a rainy 40 days), are difficult to implement when the outliers really are likely to be off the scale.  But again, people are working on this.  (An example was the recent Global Catastrophic Risks Conference.)

In conclusion: Taleb makes some good points, but his claim that everyone is ignoring these issues appears somewhat hollow.

3 August 2008

Minimizing free energy

Filed under: Commentary — András Salamon @ 13:22
Tags:

This is third-hand and several months old, but quite fascinating.  Stanislas Dehane wrote for this year’s Edge question about capturing the overall behaviour of the brain with a compact model.  The gist of his contribution (edited for brevity):

The brain is the outcome of five hundred million years of tinkering.  How could such a jumble be captured by a single mathematical law?

Karl Friston, from UCL in London, has presented “a theory of cortical responses”.  Friston’s theory rests on a single premise: the brain optimizes a free energy function.  This function measures how closely the brain’s internal representation of the world approximates the true state of the real world. From this simple postulate, Friston spins off an enormous variety of predictions.

Neuroscience now has a wealth of beautiful theories that should attract the attention of top-notch mathematicians — we will need them!

Three relevant papers, from Friston’s site:

Free energy and the brain (with KE Stephan, 2007)

A free energy principle for the brain (with J Kilner and L Harrison, 2006)

A theory of cortical responses (2005)

1 August 2008

ECAI 2008 (part 2)

Filed under: Commentary — András Salamon @ 23:23
Tags: ,

Many of the talks at the main conference were interesting; just a few are outlined here.

Flener, Pearson: Solving Necklace Constraint Problems

Pierre Flener showed how standard constraint satisfaction techniques could be applied to model an enumeration problem in discrete mathematics, without needing any complicated special purpose algorithms.  However, the necklace constraints used in such problems have nice applications in scheduling, and fast propagators can be built.

Bessiere, Hebrard, Hnich, Kiziltan, Walsh: SLIDE: a useful special case of the CardPath constraint

Toby Walsh discussed how SLIDE can be used as a building block for many other global constraints (including REGULAR and CARDPATH, which themselves generalize many others).  The claim is that SLIDE can be as efficient as specialized propagators for sequencing problems, even though propagation for such a general constraint is NP-complete.  When SLIDE is used to encode tractable constraints in the right way, there is a lot of regularity in the encoding, which then yields tractable propagation.  Using an intractable propagator in a tractable way is an interesting idea that I feel it should be possible to formalize.  Experimental results are presented in support, but the theory in Section 6 seems further development to constitute a rigorous argument.  Of course, 5 pages aren’t really enough to do the topic justice, and I look forward to the journal version.  (By the way, coding SLIDE in Tailor’s implementation of Essence’ is proving quite tricky — the lack of macros makes it rather difficult…  The unbounded arity of some of these constraints also means that the Global Constraint Catalog doesn’t list them either!)

Samadi, Schaeffer, Torabi Asr, Samar, Azimifar: Using Abstraction in Two-Player Games

Mehdi Samadi demonstrated that it is possible to abstract from game positions, for instance by removing some pawns from a chess endgame.  By dropping some pieces from the position one may end up with a position close enough to endgame databases so that the gain in accuracy is more important than the inaccuracy of abstraction.  Quite convincing experimental evidence was presented that pure abstraction outperformed Crafty.

Hoche, Flach, Hardcastle: A Fast Method for Property Prediction in Graph-Structured Data from Positive and Unlabelled Examples

Peter Flach spoke about using an iterative neighbourhood majority voting scheme to classify vertices in a graph.  This was apparently faster and not worse than a Markov chain approach.  I think this could be used to weakly tag new objects by pre-populating the suggested tags, which would potentially encourage tagging, instead of presenting an empty tag cloud to the first tagger.

There were several talks about applications of BDDs to various problems.  Donald Knuth’s statement in a talk last year (that BDDs were at the same stage as linked lists were before everyone started using them as a basic data structure in the 1960s) seems more and more on the mark.  There is something in the air…

I missed Adrian Petcu’s talk on his prize-winning dissertation, but a brief outline conveyed on the bus back to the hotel one evening is that 1) communication overhead can swamp the benefit of distributing search, but 2) there are ways of grouping messages to achieve good speedup.

A tiring week, but fun and stimulating.

ECAI 2008

Filed under: Commentary — András Salamon @ 21:33
Tags: ,

I attended ECAI 2008 in Patras, Greece last week. The conference was rather large (perhaps even unwieldy), with over 30 workshops, seven concurrent tracks in the main conference, and more than 600 people attending. Due to the excellent turnout from the constraints community, most of my time was spent at talks about constraints and search, with a few talks from other fields to provide some perspective. It was quite easy to speak to people, perhaps because our submission (Cooper, Jeavons, Salamon: Hybrid tractable CSPs which generalize tree structure) unexpectedly received the best paper award. There was a large turnout at my talk, which was a bit daunting!

Before the main conference, several interesting papers were presented at the Workshop on Modeling and Solving Problems with Constraints (proceedings are available online).

Battiti, Campigotto: Penalties may have collateral effects. A MAX-SAT analysis

Paulo Campigotto argued that fiddling with clause weights to smooth out a MAX-SAT fitness surface doesn’t work except to nudge solutions along from a local minimum; there are simply too many side effects to use this technique globally.

Freuder, Wallace, Nordlander: Debugging Constraint Models with Metamodels and Metaknowledge

Richard Wallace discussed a system named Ananke, after the Goddess of Bonds (and therefore an appropriate deity for constraint satisfaction).  Given specific solutions that the user of the system expects, Ananke suggests ways to alter an overconstrained specification of a constraint problem so that those solutions become possible.  The idea is that this would help find constraints which unnecessarily disallow desired solutions.

Gent, Miguel, Rendl: Common Subexpression Elimination in Automated Constraint Modelling

Andrea Rendl talked about the Tailor system to translate high-level Essence’ specifications of constraint problems into models that can be run using existing constraint programming systems such as Minion.  In the translation, it is possible to detect common subexpressions, as a byproduct of parsing and with minimal overhead, resulting in more compact constraint programs and sometimes also much faster search.  Even where eliminating subexpressions doesn’t yield any improvement, there is essentially no cost to including it as part of the translation phase.

Makeeva, Szymanek: Revisiting the Generalized Among Constraint

The main point in this tightly argued talk by Polina Makeeva was that it can be faster to use AMONG (if carefully implemented) than to decompose into individual global constraints, since this allows sharing and reuse of information.  This kind of result would seem to argue against the traditional, loosely coupled view of constraints interacting purely through a set of domains.

Maher, Narodytska, Quimper, Walsh: Flow-Based Propagators for the SEQUENCE and Related Global Constraints

Nina Narodytska argued that for some global constraints one can use an Integer Linear Programming model if a graph model is not known, while some kinds of ILP models can be transformed into flow problems.  If both techniques are applied, one can sometimes find fast flow-based propagators.  An example is the SEQUENCE constraint, and an algorithm was presented to enforce domain consistency down a branch of the search tree in O(n^2) time, an O(\log n) factor improvement on the best previous algorithm.

Next Page »

Blog at WordPress.com.