I'm very sorry that I missed this video live. In my defence, I was seeing the new Star Wars film at the time.
Note that I haven't watch day 233 yet, so it's possible that some of Casey's questions from day 232 have been answered in that stream. Nonetheless, let's press on.
The key misunderstanding here, is that P, NP etc. are sets of Decision problems, ie problems that have a Boolean answer.
That is true, but it doesn't matter as much as most people think. We can also say (without abusing the term too much) that the corresponding "give me a solution" problem is also in P or in NP.
The reason why is that it's usually extremely easy to turn a decision problem into a "give me a solution" problem extremely easily. Casey talked a bit about this on day 232, but mentioned it in terms of a decision solver which walks paths. Actually, we don't even need this! If we take the Travelling Salesman problem as an example, we use the decision problem as a query in an algorithm which computes the path.
Suppose that we used binary search to find the minimum cost of a TSP tour. Then the algorithm looks like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15  Input: A graph, G. A minimum cost k.
Output: S, a subset of the edges of G which represents a Travelling Salesman tour.
Let S be the empty set.
for (all edges e in G) {
if (G with e removed has a TSP solution of cost k) {
remove e from G
}
else {
add e to S
}
}
Return S

The idea is that an edge is definitely part of a Travelling Salesman tour if the graph with that edge removed has no Travelling Salesman tour. By checking all edges (and removing them as you go), you can identify a tour.
If the TSP decision problem takes polynomial time on a deterministic computer (i.e. it's in P), then the above algorithm takes polynomial time on a deterministic computer, because it invokes the TSP decision algorithm polynomially many times, and if p(x) and q(x) are polynomials, then so is p(q(x)). Similarly, if the TSP decision problem takes polynomial time on a nondeterministic computer (i.e. it's in NP), then the above algorithm takes polynomial time on a nondeterministic computer.
While this isn't the most efficient algorithm, for this area of theory it doesn't matter. Polynomial is polynomial, whether the polynomial has degree 1 or 100.
You can't always do this. One famous example is the composite number problem. The decision problem "is this number composite?" is known to be in P, but the corresponding "find me a solution problem" (given a number N, give me a factor which isn't 1 or N) is not known to be in P. But there are many cases where you can, and when you can, it's usually obvious how to do it. One of the key cases where it works is NPcomplete problems.
I say "computer", but I'm fudging a little there. The second common misunderstanding is that P and NP refer to the time taken to execute the algorithm
on a Turing machine. This doesn't matter for P and NP specifically (since a Turing machine can almost certainly emulate your favourite programming model with at most polynomial slowdown), but it matters when you're talking about the difference between (say) linear and quadratic. But more to the point, it matters when you're trying to understand what we mean by "size of the problem".
The third common misunderstanding (and this, by the way, is the answer to Casey's question about binary searching the problem space on day 232) is that P and NP are defined in terms of the size of the input, and you have to define "size" very carefully.
A Turing machine, you may recall, works on tape with a fixedsize alphabet. To make things a bit simpler, we'll assume that it's binary; each tape cell can contain either a 0, 1, or empty. The size of the input of the Travelling Salesman Problem is not the number of vertices or edges in the graph. It is the number of bits sitting on the tape.
There are lots of ways to represent number between 1 and N. However if the numbers are equally likely, you can't do any better than log N bits. Casey objected to the binary search approach of finding the minimum cost by noting that you could simply scale up all of the edge weights, say, by converting each weight n to 2^(2^n). However, the size of the problem would then be bigger, because you would need to use more bits to represent each weight; around 2^n bits each, in fact. The algorithm still takes polynomial time in the size of the problem!
Of course, if you convert n to 2^(2^n), the numbers that you get aren't equally likely. You can presumably compress the input before feeding it to the algorithm. But if the algorithm understands the compression scheme, then it also understands that not all weights are equally likely, and can use this information to make the search more efficient.
This particular misconception usually comes up first in the context of prime factoring. Every CS theory student goes through a phase where they think it's easy to factor a number in polynomial time. After all, if you have a number N, you only need to test N factors (or, even better, the square root of N), and each test is just a long division, which takes... I don't know, N^2 or N^3 time, certainly polynomial.
The problem with this line of reasoning is that the "size of the input" isn't N, it's log N. Only an idiot (or someone with a specific compression problem!) stores the number N using N bits. Yeah, this prime factoring algorithm is polynomial in N, but it's exponential in the size of the input.