NotesOnNP

Notes on NP

Consider a computational problem, i.e., a question that you want to answer about a particular structure (such as a graph, a list, ...). Let n be the size of the problem instance (e.g., the # of nodes in the graph, the # of items in the list).

The problem is polynomial-time-computable iff there is a deterministic (i.e., normal computer program-type) algorithm that answers the question that is order n^k for some constant k. That is, for sufficiently large n, the algorithm runs in time <= c(n^k). You can think of the actual time as being the # of steps the algorithm would take on a Turing machine, though we usually look at the program in a "normal" programming language instead (because it is only off by a constant multiple. So anything n², n³, etc. is polynomial, as are things that are n, n log n, log n, and constant (because these are all <= n²).

If that doesn't make sense, let me know, because the rest isn't going to work unless you understand the explanation above. All of the polynomial-time-computable functions/problems are what is known as P.

Now consider adding nondeterminism. Just as with an NDFSM, you are allowed to have multiple possible ways to solve the problem/multiple legitimate actions at each point. You still only have a polynomial-TIME algorithm, but now you're adding guessing. Think of this as rolling dice at each choice point, with the rule that if *any* path would get you to the right answer, that's good enough. (For example, for propositional formula satisfiability, guess a truth assignment. If there is any set of guesses that succeeds, the nondeterministic computation succeeds.)

Of course, to simulate all of the possible choices using a conventional computer would take a factorial or exponential number of steps; the idea is that a nondeterministic computer just has to get lucky (or a quantum computer simultaneously explores all possibilities, or...)

It turns out that, if you're allowed to guess ("nondeterministic polynomial time", or NP) there are a lot more problems you can solve (propositional satisfiability; traveling salesman; etc.) There are still problems you can't solve -- for example, do two regular expressions define the same language, which is at least exponential in the length of the expressions -- but adding lucky guesswork (or quantum superposition) sure seems to make NP more powerful than P.

Any problem that is at least NP or harder is called "NP-hard". So satisfiability and traveling salesman are NP hard, and so is the regular-expression-equality question. In contrast, sorting is NOT NP-hard; it can be solved in polynomial time without guesswork. NP-hard means "at least as hard as NP".

The problems that can be solved using polynomial time algorithms with guesswork -- satisfiability, traveling salesman, etc. -- but do not appear to be solvable in polynomial time without guesswork are called "NP-complete". That is, they are just as hard as NP but no harder. Sorting is not NP-complete because it is polynomial; it is *easier* than NP. Regular expression equivalence is also NOT NP-complete; it is even harder. The random guesses won't turn it into a polynomial-time problem. You just need more than polynomial time to solve it.

NP-complete problems all have the following characteristics:

With guesswork, there's a polynomial time algorithm that solves them. (Recall that this is NOT true for regexp equality.)
This means that, if someone *magically* *handed* you the right set of guesses, you could solve the problem in polynomial time without needing to guess. (You'd just use the guesses you were provided.) The number of guesses needed is no more than n^k for some k (i.e., it is no larger than polynomial in n), and it is often linear in n. This set of guesses is sometimes called the certificate, and it is "short" (i.e., not more than polynomial in n). Generally, the polynomial algorithm that uses the guesses just checks to see that the guesses really are a solution. (Again, this is NOT true for regexp equality; there's no short certificate.)
If we can find an algorithm to solve any NP-complete problem, we can use that algorithm plus some additional polynomial-time machinery to solve any other NP-complete problem. That is, any NP-complete problem is within a polynomial factor of hardness of any other NP-complete problem. (Again, this is NOT true for regexp equality; a solution to the satisfiability problem CANNOT be converted into a solution to regexp equality using only a polynomial number of steps. But it can be converted into a traveling salesman solver in polynomial time.) This is what we mean when we say that all NP-Complete problems are "equivalent".

A few more notes in response to some questions:

A "short certificate" varies from problem to problem but has the following features:

it is "short" -- polynomial in n or shorter, often linear
it certifies the problem solution, i.e., using it you can *verify* that the problem has a solution

Often in practice it is the solution. So, for propositional satisfiability, a truth assignment satisfying the formula in question would be a good short certificate. For the traveling salesman problem (the decision-problem version of which is, "does there exist a tour of length X or less?"), a tour visiting every city would make a short certificate. [Note that the "find the shortest tour" version of the TSP simply involves running the decision problem repeatedly....]

Finally, you can indeed use an algorithm for any NP-complete problem to solve any other NP-complete problem (with only an additional overhead of polynomial time). This is the Cook-Levin Theorem and is discussed in Sipser. For example, you could use an algorithm for solving propositional satisfiability (SAT) to solve the traveling salesman problem (TSP) or vice versa, although it may take a polynomial amount of work to convert a problem of one form into a problem of the other and/or a solution of one form into a solution of the other. (In other words, a polynomial time algorithm for one yields a polynomial time algorithm for the other.) I'm not going to write out the argument, but you can see the lecture slides from a UMD theory class here: http://www.cs.umd.edu/class/fall2009/cmsc451/lectures/Lec23-sat.pdf Note that this actually shows how to use the (hypothetical ) algorithm for Hamiltonian Cycle (HAM) to solve 3-SAT, then shows how HAM and TSP are related. The reduction from SAT to HAM is also in Sipser with a lot more words to help, but he doesn't go all the way to TSP (at least in the edition I own).