Video P vs. NP

P vs. NP and the Computational Complexity Zoo

Actually, I came across this video a long time ago, but I didn't understand what it means at that time.
Yesterday, when it was mentioned on the stream I remembered it and since Casey wanted a good explanation on the topic, I thought that it might help.
This is another video about Computation Complexity done by MIT OpenCourseWare.

It is a very nice and informative lecture and not hard to follow (You can skip the proof of "Most decision problem are not solvable" if you didn't understand it and it won't affect you).
The instructor will explain some of the computational classes: P, NP, NP-complete, NP-hard, EXP, EXP-complete, EXP-hard and R, and will show the relation between them. Also he will explain Reductions as well as P vs. NP problem.

At 49:11 he just mentioned that Traveling Salesman Problem is one of NP-complete problems which means it is not solved yet. Because all NP-complete can be reduced to each other meaning once one of them is solved all the other will be solved too which haven't happened yet.
I'm a current computer science student, and the definitions for P, NP, and NP complete are part of one of the modules I'm currently revising for.

The key misunderstanding here, is that P, NP etc. are sets of Decision problems, ie problems that have a Boolean answer.

For example, the Travelling Salesman Decision problem is "Is there a route of this length or shorter".

P is the set of Decision problems for which there is a algorithm that solves it that runs in polynomial time.

NP is the set of Decision problems for which there is a certifier (ie: an algorithm that takes a "certificate" or "solution" (for example the route for travelling salesman)) that runs in polynomial time.

So for example the certifier for the travelling salesman problem, looks at the route, and simply verifies that it is as short as required.

Edited by Quarg on
I'm very sorry that I missed this video live. In my defence, I was seeing the new Star Wars film at the time.

Note that I haven't watch day 233 yet, so it's possible that some of Casey's questions from day 232 have been answered in that stream. Nonetheless, let's press on.

The key misunderstanding here, is that P, NP etc. are sets of Decision problems, ie problems that have a Boolean answer.

That is true, but it doesn't matter as much as most people think. We can also say (without abusing the term too much) that the corresponding "give me a solution" problem is also in P or in NP.

The reason why is that it's usually extremely easy to turn a decision problem into a "give me a solution" problem extremely easily. Casey talked a bit about this on day 232, but mentioned it in terms of a decision solver which walks paths. Actually, we don't even need this! If we take the Travelling Salesman problem as an example, we use the decision problem as a query in an algorithm which computes the path.

Suppose that we used binary search to find the minimum cost of a TSP tour. Then the algorithm looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
Input: A graph, G. A minimum cost k.
Output: S, a subset of the edges of G which represents a Travelling Salesman tour.

Let S be the empty set.

for (all edges e in G) {
    if (G with e removed has a TSP solution of cost k) {
        remove e from G
    }
    else {
        add e to S
    }
}

Return S


The idea is that an edge is definitely part of a Travelling Salesman tour if the graph with that edge removed has no Travelling Salesman tour. By checking all edges (and removing them as you go), you can identify a tour.

If the TSP decision problem takes polynomial time on a deterministic computer (i.e. it's in P), then the above algorithm takes polynomial time on a deterministic computer, because it invokes the TSP decision algorithm polynomially many times, and if p(x) and q(x) are polynomials, then so is p(q(x)). Similarly, if the TSP decision problem takes polynomial time on a nondeterministic computer (i.e. it's in NP), then the above algorithm takes polynomial time on a nondeterministic computer.

While this isn't the most efficient algorithm, for this area of theory it doesn't matter. Polynomial is polynomial, whether the polynomial has degree 1 or 100.

You can't always do this. One famous example is the composite number problem. The decision problem "is this number composite?" is known to be in P, but the corresponding "find me a solution problem" (given a number N, give me a factor which isn't 1 or N) is not known to be in P. But there are many cases where you can, and when you can, it's usually obvious how to do it. One of the key cases where it works is NP-complete problems.

I say "computer", but I'm fudging a little there. The second common misunderstanding is that P and NP refer to the time taken to execute the algorithm on a Turing machine. This doesn't matter for P and NP specifically (since a Turing machine can almost certainly emulate your favourite programming model with at most polynomial slowdown), but it matters when you're talking about the difference between (say) linear and quadratic. But more to the point, it matters when you're trying to understand what we mean by "size of the problem".

The third common misunderstanding (and this, by the way, is the answer to Casey's question about binary searching the problem space on day 232) is that P and NP are defined in terms of the size of the input, and you have to define "size" very carefully.

A Turing machine, you may recall, works on tape with a fixed-size alphabet. To make things a bit simpler, we'll assume that it's binary; each tape cell can contain either a 0, 1, or empty. The size of the input of the Travelling Salesman Problem is not the number of vertices or edges in the graph. It is the number of bits sitting on the tape.

There are lots of ways to represent number between 1 and N. However if the numbers are equally likely, you can't do any better than log N bits. Casey objected to the binary search approach of finding the minimum cost by noting that you could simply scale up all of the edge weights, say, by converting each weight n to 2^(2^n). However, the size of the problem would then be bigger, because you would need to use more bits to represent each weight; around 2^n bits each, in fact. The algorithm still takes polynomial time in the size of the problem!

Of course, if you convert n to 2^(2^n), the numbers that you get aren't equally likely. You can presumably compress the input before feeding it to the algorithm. But if the algorithm understands the compression scheme, then it also understands that not all weights are equally likely, and can use this information to make the search more efficient.

This particular misconception usually comes up first in the context of prime factoring. Every CS theory student goes through a phase where they think it's easy to factor a number in polynomial time. After all, if you have a number N, you only need to test N factors (or, even better, the square root of N), and each test is just a long division, which takes... I don't know, N^2 or N^3 time, certainly polynomial.

The problem with this line of reasoning is that the "size of the input" isn't N, it's log N. Only an idiot (or someone with a specific compression problem!) stores the number N using N bits. Yeah, this prime factoring algorithm is polynomial in N, but it's exponential in the size of the input.
To have a “wormhole” you need a worm to dig it. If no worm, no hole either…

In short this is the solution of “P vs. NP” problem. The answer is “No”. P is not equal to NP, as there is no “worm” to dig a shortcut and this way bypass the exponential algorithm tragedy. There is no shortcut and such one is impossible. The purpose of this article is to prove this.

To achieve this the concept of this article is to prove that a specific NP-complete problem cannot be solved non-exponentially. This will prove that P≠NP, because otherwise contradiction will appear. If a NP-complete problem is proved as un-solvable in polynomic time and another NP-complete is solved then the first will also be solved as it could be converted into second one. Obviously not-possible.

To prove this assertion we can use the “Subset sum problem”. If we prove that it cannot be solved non-exponentially, this will be the end of “P vs. NP” discussion.

And the answer of “Subset sum problem” is very short and simple:

You can not avoid exponents (xn) in a problem where combinatorics is involved. No such possibility. And that is in short the solution of “P vs. NP”. P is not equal to NP.

You don’t need to read any more this article unless you are keen on brain gymnastics and developing intellect by thinking on complex issues. If you are interested just in the answer, that is all – exponents are inevitable with combinatorics involved.

And you can not solve the subset sum problem without combinatorics. Theoretically if such a decision was possible, so it will mean existence of functional dependency between the numbers. Functional dependency and calculating by formula is the only way to find the right answer among many other wrong answers. But this will contradict the initial condition of the problem – finding a subset in a set of arbitrary integers. The problem is defined as universal – the algorithm must work on ANY set of integers. And this excludes functional dependency. As no dependency exists, combinatorics is the way. And combinatorics means exponents.

You can do whatever you want with the initial set (below are some thoughts on this), but never the less what you do, some subset will remain for combinatorial checkup. Therefore the only result of your effort will be a hypothetic reduction of the exponent size (xn, xn/2…xn/1000…). But in all these cases some exponential element will remain. Generally the most used approach in this area is splitting the initial set to subsets, because this way the exponential power goes down. But anyway there remains an exponential component. In addition, splitting has some limits. You can not split too much, because you will start missing some combinations containing possible solution. To avoid this you will have to combine again subsets each other and this will bring you back to a previous level of splitting. But this is in the next part of analysis. The real solution of “P vs. NP” is as follows:

“Subset sum problem” is a complete NP-problem. “Subset sum problem” can not be solved without using combinatorics at all. The existence of combinatorics means unavoidable exponents. And this means that no pure polynomic solution is possible. As “Subset sum problem” is proven unsolvable in polynomic time, and as it is a NP-complete, therefore:

P≠NP

2.Some thoughts on what is theoretically lowest needed complexity for “Subset sum problem”

Now let’s return to the initial epigram…

To have a “wormhole” you need a worm to dig it. If no worm, no hole either…

So let’s start searching for “worms”. I.e. for something that could help us reduce this horrifying check-all-combinations calculation burden. What could help us avoid this? Is there a “worm”, a “privilege”, any concept that could help us?

Unfortunately No.

All we have is a set of numbers. And nothing more. No number is marked as preferable. No any arithmetic operation on numbers can help us avoid some of the calculations. There is nothing that can help us. No “worm” to dig. No “filtering privilege” to send in trash these unneeded and exhausting combinations. No such savior among numbers.

Anyway we have something that can help. But not enough. We have 3 important things that can be used for filtering:

-We have the initial set itself and can somehow manipulate it. Generally we can split it

-We have a target number that the subsets must equal - for simplification we could accept this as zero(0).

-We have the value of the current calculated combination

That is all we have. In fact these elements can be used for some filtering. But not enoung to make any potential algorithm non-exponential. In fact in one way or another exactly these 3 elements are used in all algorithms for solving the problem, including the most effective ones. These elements can be used in different ways – directly or indirectly, by summing, sorting, merging, etc. But all ways descend visibly or not, from these 3 – manipulating the set, target number and value of current calculated combination.

So if based only on these 3 factors (worms, privilege filters) we can not have a non-exponential solution, therefore we can not have such a solution at all.

So the purpose of the second part of this article is to find the filtering limit, based on the set, current calculated value and target number.

Let’s start with the set, because this is used together with the rest 2 factors.

Splitting the set is a powerful tool of lowering the exponential power. Working with smaller subsets you can reduce the number of combinations to be checked. Using it along with other filters the overall job done will go down. Anyway splitting does not remove the exponential nightmare for the computer. It just makes it smaller.

In addition there are limits of splitting. If you split only in 2, then you will lower the exponent power to the level of the bigger subset. If 2 subsets are equal, you will have exponential power of n/2. So why not then spilt in 4,5…1000? Why not split until we reach a subset of 1 number and solve the problem instantly? The answer is obvious. If you split on more than 2 subsets, you risk starting to miss some of the combinations needed for check. To avoid this you will have to combine again subsets each other. And this will bring you back to a previous level of splitting. At last you will be back to the initial split in 2 subsets. In this case with any subset having just one another subset as a possible “partner” there is no risk of missing combinations. So the theoretical limit of filtering power of splitting is n/2.

Now let’s go to the target number. It is the weaker filter. The main thing it can be used for is to conveniently divide the initial set into 2 subsets (for instance positive and negative after sorting). So it is an interesting “split point” that can be useful in some cases. In fact you can divide it at any number, but if it is not the target number you will lose some of the filtering power of this number. In fact some of algorithms are doing exactly this, but just because the “profit” from the other split is bigger than the loss from this. Anyway this is not so important issue, now we are on the matter of target value.

As we said above dividing the set to subsets is very useful because this way we can lower the complexity to calculate all possible combinations in each of the 2 subsets. For instance, if 2 subsets are equal, then we will have complexity lowered from O(2n) to (2*2n/2). If they are not equal we will have different reductions in each of 2 subsets.

Why the target number is interesting? The answer is because it splits a hypothetically sorted set of numbers to 2 subsets that does not need to calculate the combinations of numbers that are only from one of the sides of the target number. I.e. if the split is on positive and negative numbers and the target is 0, then you don’t need to check all combinations containing only positive or only negative numbers. You will save much of the time of your processor.

Anyway the target number as filtering factor is not so important. Sometimes it can be “more profitable” to split at any other number.

Once we have 2 subsets, theoretically we can go back to initial version by making each-with-each combination and then the complexity will go back to O(2n).

And here it comes the second filter – the value of the current combination. Using it we can prevent part of exactly this “going back”. In fact, we will really go back as we will have to check many combinations that contain members of both 2 subsets. But using the “current value” we can avoid part of the job.

So what is the “filtering power” of current value?

Generally we can benefit in 2 ways – by filtering final calculated values of other combinations, or by filtering the initial numbers that start creating combinations and also already calculated combinations from which next level of combinations descend.

These are 2 different approaches that are used in different algorithms. In this article I will not talk about any algorithm, nor focus on inventing algorithms. I am only analyzing the limits of filters that are one or another way implemented in these algorithms.

In filtering against final calculated values of other combinations, using the current value of current combination can be very powerful. If the subsets are sorted the current value can filter almost all useless combinations. Imagine a set of 1000 calculated positive values that are sorted. You have a negative current value. So you can theoretically filter (by some type of genius algorithm) all, or almost all, the values that are not equal to your current value. So the theoretical filtering power of current value is close to 100% when used against other calculated values that are sorted.

Anyway in this case the computer power will have to go to calculating other values and sorting them. In the case of a set divided in 2 subsets, we will have O(2n/2) for creating each subset plus some logarithmic cost of sorting. So if we want to use this extreme filtering power we will have to “invest” in initial calculations for creating subsets. If they are equal the complexity will be O(2n/2). But if we have unequal as size subsets then we risk increasing the complexity to the one of the bigger of the 2 subsets (for instance O(2n/4)- small and O(23n/4)-big).

So we come to the second approach. We can use the filtering power of current calculated value to filter combinations before they are calculated. For instance with current calculated value of -5 and a target of 0, we can filter all combinations that are based on positive numbers larger than 6. This means all 1-number possibilities and all 2,3,4…n combinations that descend from 6 and upper numbers. In addition we can on next step filter all 2-number combinations that exceed value of 5. For instance we will check the number 2, but will filter the combination “2+4” and all consecutive combinations. And so on.

All this can be done if we are using some type of tree-building algorithm with any new combination coming from the previous one. Combined with initially sorted set (or subset), this filtering can be very impressive as a size. And we don’t forget that in this case we have not still invested computer power in calculating all-possible-combinations.

But what is the limit of this filtering before calculating? In fact the calculation is not complex. In this case the dependency is something like “linear”. Not exactly linear because the function is based on discrete numbers and is not uninterrupted. But generally the calculations are the same. The number of calculations that can be filtered and avoided of calculating is depending on the absolute size of the current calculated value. If it is very low and has a low absolute value of the proportion of this value and maximal theoretic value for the subset, then we will be able to filter almost all of the combinations. Imagine current value of -2 and 1000 positive numbers in subset of positives with 999 numbers higher than 2. You will be able to filter 99,9% of all these combinations.

But look at the other side. Imagine current calculated value of -10000 and 1000 positive numbers with 999 of them lower than 10000. You will avoid just 0,01% of combinations. Of course you will have the ability to filter among 2,3…n number combinations. But theoretically if you have a current value that is close to the maximal possible calculated value (as absolute value, it can be negative) then you will be able to filter almost nothing.

So this dependency is something like linear. There are math rules here that lead us to filtering about half of the possible combinations. Тhe function is something like:

z=(y(x1)+y(x2)+…+y(xn))/n

y(x)=(x/constant1)*constant2.

Where

z – final number of calculated combinations

y - number of checked combinations for a given value of x

x – current calculated value

constant1 – theoretical max calculated value

constant2 – average density of combinations in set

The sum of all members(y) is something like half of the all possible combinations with linear-type function like this.

So is that good and enough?

If you don’t have any initial divide this economy is not so impressive. You will go to O(2n/2). If we try a mixed approach we can calculate all and sort the smaller subset, and then try filtering with the bigger subset. In our previous cate - O(2n/4)- small and O(23n/4)-big). So the optimization will be to a level of O(23n/4/2). Obviously much worse than O(2n/2) with 2 equal as size subsets.

And that is the end. That is all. All possible possibilities are checked. The power to filter the combinations is limited to O(2n/2). And no algorithm based on these three “worms” can do better. In fact this is the answer of the question why after 4 decades of searching it is not yet discovered an algorithm better than the one of Horowitz and Sahni*. In fact this algorithm is very close to the theoretically possible lowest level complexity O(2n/2) and any other better algorithm will not bring enormous improvement.

So finally – if the “Subset sum problem” is proven unsolvable in non-exponential time, and as it is a NP-complete problem, therefore: P≠NP
Now go publish this and claim your million-dollar prize: http://www.claymath.org/millennium-problems/millennium-prize-problems
Useful link.