PHYS771 Lecture 5: Paleocomplexity

Scott Aaronson

By any objective standard, the theory of computational complexity ranks as one of the greatest intellectual achievements of humankind -- along with fire, the wheel, and computability theory. That it isn't taught in high schools is really just an accident of history. In any case, we'll certainly need complexity theory for everything else we're going to do in this course, which is why the next five or six lectures will be devoted to it. So before we dive in, let's step back and pontificate about where we're going.

What I've been trying to do is show you the conceptual underpinnings of the universe, before quantum mechanics comes on the scene. The amazing thing about quantum mechanics is that, despite being a grubby empirical discovery, it changes some of the underpinnings! Others it doesn't change, and others it's not so clear whether it changes them or not. But if we want to debate how things are changed by quantum mechanics, then we'd better understand what they looked like before quantum mechanics.

It's useful to divide complexity theory into historical epochs:

This lecture will be about "paleocomplexity": complexity in the age before P, NP, and NP-completeness, when Diagonalosaurs ruled the earth. Then Lecture 6 will cover the Karpian Explosion, Lecture 7 the Randomaceous Era, Lecture 8 the Early Cryptozoic, and Lecture 9 the Invasion of the Quantodactyls.

We talked on Thursday about computability theory. We saw how certain problems are uncomputable -- like, given a statement about positive integers, is it true or false? (If we could solve that, then we could solve the halting problem, which we already know is impossible.)

But now let's suppose we're given a statement about real numbers -- for example,

-- and we want to know if it's true or false. In this case, it turns out that there is a decision procedure -- this was proved by Tarski in the 1930's, at least when the statement only involves addition, multiplication, comparisons, the constants 0 and 1, and universal and existential quantifiers (no exponentials or trig functions).

Intuitively, if all our variables range over real numbers instead of integers, then everything is forced to be smooth and continuous, and there's no way to build up Gödel sentences like "this sentence can't be proved."

(If we throw in the exponential function, then apparently there's still no way to encode Gödel sentences, modulo an unsolved problem in analysis. But if we throw in the exponential function and switch from real numbers to complex numbers, then we're again able to encode Gödel sentences -- and the theory goes back to being undecidable! Can you guess why? Well, once we have complex numbers, we can force a number n to be an integer, by saying that we want e2πin to equal 1. So we're then back to where we were with integers.)

Anyway, the attitude back then was, OK, we found an algorithm to decide the truth or falsehood of any sentence about real numbers! We can go home! Problem solved!

Trouble is, if you worked out how many steps that algorithm took to decide the truth of a sentence with n symbols, it grew like an enormous stack of exponentials: . So I was reading in a biography of Tarski that, when actual computers came on the scene in the 1950's, one of the first things anyone thought to do was to implement Tarski's algorithm for deciding statements about the real numbers. And it was hopeless -- indeed, it would've been hopeless even on the computers of today! On the computers of the 1950's, it was .

So, these days we talk about complexity. (Or at least most of us do.) The idea is, you impose an upper bound on how much of some resource your computer can use. The most obvious resources are (1) amount of time and (2) amount of memory, but many others can be defined. (Indeed, if you visit the Complexity Zoo, you'll find several hundred of them.)

One of the very first insights is, if you ask how much can be computed in 10 million steps, or 20 billion bits of memory, you won't get anywhere. Your theory of computing will be at the mercy of arbitrary choices about the underlying model. In other words, you won't be doing theoretical computer science at all: you'll be doing architecture, which is an endlessly-fascinating, non-dreary, non-boring topic in its own right, but not our topic.

So instead you have to ask a looser question: how much can be computed in an amount of time that grows linearly (or quadratically, or logarithmically) with the problem size? Asking this sort of question lets you ignore constant factors.

So, we define TIME(f(n)) to be the class of problems for which every instance of size n is solvable (by some "reference" computer) in an amount of time that grows like a constant times f(n). Likewise, SPACE(f(n)) is the class of problems solvable using an amount of space (i.e., bits of memory) that grows like a constant times f(n).

What can we say? Well, for every function f(n), TIME(f(n)) is contained in SPACE(f(n)). Why? Because a Turing machine can access at most one memory location per time step.

What else? Presumably you agree that TIME(n2) is contained in TIME(n3). Here's a question: is it strictly contained? In other words, can you solve more problems in n3 time than in n2 time? (Here the choice of the exponents 3 and 2 is obviously essential. Asking whether you can solve more problems in n4 time than n3 time would just be ridiculous!)

Seriously, it turns out that you can solve more problems in n3 time than in n2 time. This is a consequence of a fundamental result called the Time Hierarchy Theorem, which was proven by Hartmanis and Stearns in the mid-1960's and later rewarded with a Turing Award. (Not to diminish their contribution, but back then Turing Awards were hanging pretty low on the tree! Of course you had to know to be looking for them, which not many people did.)

Let's see how the proof goes. We need to find a problem that's solvable in n3 time but not n2 time. What will this problem be? It'll be the simplest thing you could imagine: a time-bounded analogue of Turing's halting problem.

Clearly we can solve the above problem in n3 steps, by simulating M for n2.5 steps and seeing whether it halts or not. (Indeed, we can solve the problem in something like n2.5 log n steps. We always need some overhead when running a simulation, but the overhead can be made extremely small.)

But now suppose there were a program P to solve the problem in n2 steps. We'll derive a contradiction. By using P as a subroutine, clearly we could produce a new program P' with the following behavior. Given a program M as input, P'

  1. runs forever if M halts in at most n2.5 steps given its own code as input, or
  2. halts in n2.5 steps if M runs for more than n2.5 steps given its own code as input.

Furthermore, P' does all of this in at most n2.5 steps (indeed, n2 steps plus some overhead).

Now what do we do? Duh, we feed P' its own code as input! This gives us a contradiction, which implies that P can never have existed in the first place.

Obviously I was joking when I said the choice of n3 versus n2 was essential. We can substitute n17 versus n16, 3n versus 2n, etc. But there's actually an interesting question here: can we substitute any functions f and g such that f grows significantly faster than g? The surprising answer is no! The function g needs a property called time-constructibility, which means (basically) that there's some program that halts in g(n) steps given n as input. Without this property, the program P' wouldn't know how many steps to simulate M for, and the argument wouldn't go through.

Now, every function you'll ever encounter in civilian life will be time-constructible. But in the early 1970's, complexity theorists made up some bizarre, rapidly-growing functions that aren't. And for these functions, you really can get arbitrarily large gaps in the complexity hierarchy! So for example, there's a function f such that TIME(f(n))=TIME(2f(n)). (Duuuuude. To those who doubt that complexity is better than cannabis, I rest my case.)

Anyway, completely analogous to the Time Hierarchy Theorem is the Space Hierarchy Theorem, which says there's a problem solvable with n3 bits of memory that's not solvable with n2 bits of memory.

Alright, next question: in computer science, we're usually interested in the fastest algorithm to solve a given problem. But is it clear that every problem has a fastest algorithm? Or could there be a problem that admits an infinite sequence of algorithms, with each one faster than the last but slower than some other algorithm?

Contrary to what you might think, this is not just a theoretical armchair question: it's a concrete, down-to-earth armchair question! As an example, consider the problem of multiplying two n-by-n matrices. The obvious algorithm takes O(n3) time. In 1968 Strassen gave a more complicated algorithm that takes O(n2.78) time. Improvements followed, culminating in an O(n2.376) algorithm of Coppersmith and Winograd. But is that the end of the line? Might there be an algorithm to multiply matrices in n2 time? Here's a weirder possibility: could it be that for every ε>0, there exists an algorithm to multiply n-by-n matrices in time O(n2+ε), but as ε approaches 0, these algorithms become more and more complicated without end?

See, some of this paleocomplexity stuff is actually nontrivial! (T-Rex might've been a dinosaur, but it still had pretty sharp teeth!) In this case, a 1967 result called the Blum Speedup Theorem says that there really are problems that admit no fastest algorithm. Not only that: there exists a problem P such that for every function f, if P has an O(f(n)) algorithm then it also has an O(log f(n)) algorithm!

Neither would I! So let's see how it goes. Let t(n) be a complexity bound. Our goal is to define a function f, from integers to {0,1}, such that if f can be computed in O(t(n)) steps, then it can also be computed in O(t(n-i)) steps for any positive integer i. Taking t to be sufficiently large then gives us as dramatic a speedup as we want: for example, if we set t(n):=2t(n-1), then certainly t(n-1)=O(log t(n)).

Let M1,M2,... be an enumeration of Turing machines. Then let Si = {M1,...,Mi} be the set consisting of the first i machines. Here's what we do: given an integer n as input, we loop over all i from 1 to n. In the ith iteration, we simulate every machine in Si that wasn't "cancelled" in iterations 1 to i-1. If none of these machines halt in at most t(n-i) steps, then set f(i)=0. Otherwise, let Mj be the first machine that halts in at most t(n-i) steps. Then we define f(i) to be 1 if Mj outputs 0, or 0 if Mj outputs 1. (In other words, we cause Mj to fail at computing f(i).) We also "cancel" Mj, meaning that Mj doesn't need to be simulated in any later iteration. This defines the function f.

Certainly f(n) can be computed in O(n2 t(n)) steps, by simply simulating the entire iterative procedure above. The key observation is this: for any integer i, if we hardwire the outcomes of iterations 1 to i into our simulation algorithm (i.e. tell the algorithm which Mj's get cancelled in those iterations), then we can skip iterations 1 to i, and proceed immediately to iteration i+1. Furthermore, assuming we start from iteration i+1, we can compute f(n) in only O(n2 t(n-i)) steps, instead of O(n2 t(n)) steps. So the more information we "precompute," the faster the algorithm will run on sufficiently large inputs n.

To turn this idea into a proof, the main thing one needs to show is that simulating the iterative procedure is pretty much the only way to compute f: or more precisely, any algorithm to compute f needs at least t(n-i) steps for some i. This then implies that f has no fastest algorithm.

Puzzle 1 From Last Week

Can we assume, without loss of generality, that a computer program has access to its own code? As a simple example, is there a program that prints itself as output?

The answer is yes: there are such programs. In fact, there have even been competitions to write the shortest self-printing program. At the IOCCC (the International Obfuscated C Code Contest), this competition was won some years ago by an extremely short program. Can you guess how long it was: 30 characters? 10? 5?

The winning program had zero characters. (Think about it!) Admittedly, a blank file is not exactly a kosher C program, but apparently some compilers will compile it to a program that does nothing.

Alright, alright, but what if we want a nontrivial self-printing program? In that case, the standard trick is to do something like the following (which you can translate into your favorite programming language):

In general, if you want a program to have access to its own source code, the trick is to divide the program into three parts: (1) a part that actually does something useful (this is optional), (2) a "replicator," and (3) a string to be replicated. The string to be replicated should consist of the complete code of the program, including the replicator. (In other words, it should consist of parts (1) and (2).) Then by running the replicator twice, we get a spanking-new copy of parts (1), (2), and (3).

This idea was elaborated by von Neumann in the early 1950's. Shortly afterward, two guys (I think their names were Crick and Watson) found a physical system that actually obeys these rules. You and I, along with all living things on Earth, are basically walking computer programs with the semantics

Puzzle 2 From Last Week

If water weren't H2O, would it still be water?

Yeah, this isn't really a well-defined question: it all boils down to what we mean by the word water. Is water a "predicate": if x is clear and wet and drinkable and tasteless and freezable to ice, etc. ... then x is water? On this view, what water "is" is determined by sitting in our armchairs and listing necessary and sufficient conditions for something to be water. We then venture out into the world, and anything that meets the conditions is water by definition. This was the view of Frege and Russell, and it implies that anything with the "intuitive properties" of water is water, whether or not it's H2O.

The other view, famously associated with Saul Kripke, is that the word water "rigidly designates" a particular substance (H2O). On this view, we now know that when the Greeks and Babylonians talked about water, they were really talking about H2O, even though they didn't realize it. Interestingly, "water = H2O" is thus a necessary truth that was discovered by empirical observation. Something with the same properties as water but a different chemical structure would not be water.

Kripke argues that, if you accept this "rigid designator" view, then there's an implication for the mind-body problem.

The idea is this: the reductionist dream would be to explain consciousness in terms of neural firings, in the same way that science explained water as being H2O. But Kripke says there's a disanalogy between these two cases. In the case of water, we can at least talk coherently about a hypothetical substance that feels like water, tastes like water, etc., but isn't H2O and therefore isn't water. But suppose we discovered that pain is always associated with the firings of certain nerves called C-fibers. Could we then say that pain is C-fiber firings? Well, if something felt like pain but had a different neurobiological origin, would we say that it felt like pain but wasn't pain? Presumably we wouldn't. Anything that feels like pain is pain, by definition! Because of this difference, Kripke thinks that we can't explain pain as "being" C-fiber firings, in the same sense that we can explain water as "being" H2O.

Some of you look bored. Dude -- this is considered one of the greatest philosophical insights of the last four decades! I'm serious! Well, I guess if you don't find it interesting, philosophy is not the field for you.

[Discussion of this lecture on blog]

[← Previous lecture | Next lecture →]

[Return to PHYS771 home page]