Day 232: Travelling Salesman / NP-Hard

There's some confusion here about what NP-Hard means. NP-Hard means "at least as hard as the hardest NP problem". However it doesn't require that the problem be in NP. The overlap, where problems are NP-Hard *and* in NP, is NP-Complete.

The Travelling Salesman decision problem is NP-Complete. Therefore, the actual solution to Travelling Salesman has to be *at least* as hard -- if you found a solution you've proven it exists. But the full problem is not NP-Complete, it is NP-Hard.

So Casey is correct. P=NP doesn't put Travelling Salesman in P.
Disclamer: I am not theoretical computer scientist.

That's correct for the technical reason that the Travelling Salesman Problem (TSP) isn't a decision problem. It's a function problem, we need to map an instance of the problem to a tour that has minimal length (I think technically this is better described as an optimization problem, but I'm not sure the difference matters for what I'm going to say here).

As it turns out, there's a corresponding set of complexity classes for function problems FP and FNP (more[sup]1[/sup] interesting classes can be found in the Complexity Zoo, though good intuitive explanations of most of this stuff seem hard to come by.)

I'm not sure exactly how TSP fits into that group of classes (I think it's probably FNP-complete but that's just kind of a guess) BUT the method that Casey mentioned on stream today actually can be used to show that it is in FP[sup]2[/sup] if P = NP, sort of.

The argument was basically binary search from some value like the sum of all the weights until you find the minimum length, M. From here you can just remove edges from the graph and query the oracle again: if there stops being a tour of length M you know the edge you just removed was part of the last minimal length tour. From that you can actually solve the function problem version by removing edges until you're left with just edges that would be part of the minimal length tour.

Casey's problem with that argument was that you could set the weights such that they sum to 2^(2^n) and thus your binary search would take 2^n uses of the oracle to find the minimum length. This is a pretty interesting argument. My problem with it is that the difficulty in doing the binary search comes purely from the fact that the lengths of the numbers are growing exponentially. If the lengths of the numbers are growing exponentially I don't think you can stop considering them when talking about the size of the input since operations of the numbers then become O(2^n) instead of O(1). I can't really think of any examples of the top of my head, but I suspect you could do some similar double exponential weighting with problems "in" P and end up showing that they're actually not in P.

If you restrict the weight precision to something that can only grow by 2^(P(n) the the argument ends up working. Is that satisfactory? I'm not sure. On one hand it adds a constraint onto specifying instances of the problem so it is making things easier. But on the other hand numbers with arbitrary precision seems to make it so you're largely focusing on the size of the numbers, rather than the actual problem.

That's pretty much where I am with this topic. It's pretty possible that I poorly explained or even misunderstood something in what I just said so I welcome responses pointing out mistakes or asking for clarification.

1. A bunch fall out of the setting a limit on how many of the non-deterministic paths a NDTM lead to a correct result, for example if some constant fraction over half of them accept you can just guess and you have BPP (because you can just guess which path to take), which is basically problems that can be solved in polynomial time with probabilistic algorithms

2. Technically it would show that it's in FP^NP (meaning FP with an oracle to NP, which basically means "problems that would be in FP if we could solve NP problems in one step"). From here if we assume P = NP then we could just solve NP problems in polynomial time and as long as we only use the oracle a polynomial amount of times it's fine since a polynomial times a polynomial is still a polynomial; so the oracle doesn't matter if P = NP)

Edited by Ronald Fenelus on
So here is my take on the problem with TSP:

Hand-wavy speaking the term P refers to the class of problems which take polynomial time of solving where the polynomial part is a polynomial over the size of the imput. If I did understand Casey's argument correctly he wants to assign numbers of size 2^2^n to each of the edges in the graph. The problem I see with that is that it increases the input size exponentially. The number of nodes is not the only thing that is measured in the term "input".

If we take the "real" TSP problem as a function it would get for example two arguments, the number of nodes (n) and an array of length n*(n-1)/2 of costs assigned to each pair of nodes. But if we want to have these costs in orders of 2^2^n we need 2^n bits to represent them. All together the input size is exponentially large (measured in n).

Lets suppose that we have found a polynomial algorithm for TSP and that when it faces an input of size x it would take p(x) steps, p being some polynomial. Well, then it takes O(p(2^n)) steps to solve the Casey TSP. That it takes exponential time compared to n is not a contradiction, it just means you did not measure the right thing, you should have measured it in orders of 2^n.

Another example: One could just as easily argue that sorting is not in P. For that just take arbitrarily large numbers that are the same for all but the last few digits. Comparing 2^2^n and 2^2^n+1 takes 2^n steps (measured in bits) because only at the last digit we can distinguish between the two. So even sorting a list with two entries can take exponential time... but only if the input is exponentially large.

I hope that this sheds some insight in the issue and that I did not completely missed something.

Edited by Horrowind on