The arc of complexity is long, but it bends toward lower bounds

As MIT grad student Jelani Nelson rightly pointed out to me, an historic world event took place on Tuesday, January 20—an event that many of us have awaited for decades, one that we thought we’d never live to see—and I inexcusably failed my readers by neglecting to blog about it.  The event in question, as everyone knows, was Mark Braverman posting to his web page what looks to be a proof of the Linial-Nisan Conjecture.  The LN conjecture, posed in 1990, held that

Polylog-wise independence fools AC0.

Alright, let me try again in English.  The conjecture says that no logic circuit, composed of a polynomial number of AND, OR, and NOT gates (of unbounded fan-in) arranged in a constant number of layers, can distinguish n input bits x1,…,xn that are truly random, from n input bits that look random on every subset of (say) n0.001 bits, but that could be correlated in arbitrary ways across larger scales.  In other words, if such a circuit accepts truly random bits with probability close to 1, then it also accepts the pseudorandom bits with probability close to 1, and vice versa.  If you want to distinguish the random bits from the pseudorandom bits with noticeable bias, then you need a more powerful kind of circuit: either greater depth (say, log(n) layers instead of O(1)), or more gates (say, exponentially many), or more powerful gates (say, XOR or MAJORITY gates instead of just AND, OR, and NOT).  To a constant-depth, polynomial-size, AND/OR/NOT circuit (which we call an AC0 circuit for short—don’t ask why), local randomness looks just the same as global randomness.  Or so says the Linial-Nisan Conjecture.

Now, we’ve known since the eighties that AC0 circuits have serious limitations.  In particular, we’ve known lots of specific pseudorandom distributions that fool them.  What Linial and Nisan conjectured, and Braverman appears to have proved, is that any distribution will do the job, just so long as it “looks random locally.”

A year and a half ago, Bazzi proved the Linial-Nisan conjecture in the special case of depth-two circuits, in a 64-page tour de force.  Then Razborov gave an essentially 2-page proof of the same result.  (Need I explain how awesome that is?)  Braverman extends Bazzi’s result to circuits of any constant depth; his proof is almost as short as Razborov’s.

In proving these lower bounds, the name of the game is the polynomial method (the subject of my FOCS tutorial).  Given an AC0 circuit C, you first construct a low-degree real polynomial that approximates C pretty well on most inputs.  (How do you construct such a thing?  And what does “pretty well” mean?  Save it for the comments section.)  Then you observe that no low-degree polynomial could possibly distinguish a random string from a string that only looks random locally.  Why?  Because a low-degree polynomial, by definition, is a sum of local terms, and if none of those individual terms can distinguish truly random bits from pseudorandom ones (as was assumed), then their sum can’t distinguish them either, by the deep principle of the universe we call linearity of expectation.  (By contrast, an AND or OR of terms could in principle detect “global” properties of the input that none of the individual terms detected—which is why we couldn’t just apply such an argument to the AC0 circuit directly.)  It follows, then, that the original circuit couldn’t have distinguished local randomness from global randomness very well either, which is what we wanted to show.

So everything boils down to constructing these low-degree approximating polynomials and proving they have the right properties.  And in that context, what Braverman does is almost hilariously simple.  Given an AC0 circuit C, he first constructs a low-degree polynomial p that agrees with C on most inputs (from whatever fixed probability distribution you want), using the celebrated method of Valiant-Vazirani and Razborov-Smolensky.  He then observes that, when p fails to agree with C, there’s another AC0 circuit E, of depth slightly greater than C, that detects the failure.  Next he finds a low-degree polynomial q that approximates E in L2-norm, using the also-celebrated 1993 theorem of Linial-Mansour-Nisan. Then he looks at p(1-q), and shows that it’s a polynomial that usually agrees with C, but when it does disagree, usually isn’t too far off.  And then … well, at that point he’s really almost done.

While I had no involvement whatsoever with this beautiful result, I’m pleased to have unwittingly set in motion a chain of events that led to it.  Since the summer, I’ve been trying to get as many lowerbounderati as possible interested in BQP versus PH, a central open problem of quantum complexity theory that’s resisted progress since the prehistoric days of 1993.  (There are certain problems that I mentally classify as “rabbits,” after the Killer Rabbit of Caerbannog from Monty Python and the Holy Grail.  BQP vs. PH is one of the fluffiest, most adorable rabbits ever to leap for my throat.)

Concretely, the goal has been to construct an oracle relative to which BQP (Bounded-Error Quantum Polynomial-time, the class of problems that are feasible for a quantum computer) is not contained in PH (the Polynomial-time Hierarchy, a generalization of NP).  Such a separation would give us probably our best evidence to date that BQP is not contained in NP—or loosely speaking, that not only can quantum computers solve certain problems exponentially faster than classical ones, they can solve certain problems exponentially faster than classical computers can even verify the answers.

(NerdNote: We do have oracles relative to which BQP⊄NP, and indeed BQP⊄MA.  But we still don’t have an oracle relative to which BQP⊄AM.  And that sticks in the craw, since we know that AM=NP under a derandomization hypothesis.)

Now, it occurred to me that BQP versus PH is closely related to the Linial-Nisan Conjecture.  That’s not quite as surprising as it sounds, since you can think of PH as the “exponentially scaled-up version” of AC0 … so that fighting PH ultimately boils down to fighting AC0.

Alright, so consider the following problem, which we’ll call Fourier Checking.  You’re given black-box access to two Boolean functions f,g:{-1,1}n→{-1,1}, and are promised that either

  1. f and g were both generated uniformly at random (independently of each other), or
  2. f and g were generated by first choosing a random 2n-dimensional unit vector v, then setting f(x)=sgn(vx) and g(x)=sgn((Hv)x), where H represents the Fourier transform over Z2n.

The problem is to decide which, with small probability of error.

It’s not hard to see that Fourier Checking is in BQP (i.e., is efficiently solvable by a quantum computer).  For to solve it, you just go into a uniform superposition over all x∈{-1,1}n, then query f, apply a Quantum Fourier Transform, query g, and see if you’re left with (1) random garbage or (2) something close to the uniform superposition that you started with.

On the other hand, one can show that:

  • A certain generalization of Bazzi’s Theorem (from “local randomness” to “local almost-randomness”—as usual, ask in the comments section) would imply that Fourier Checking is not in an important subclass of PH called AM (for “Arthur-Merlin”).  And thus, we’d get an oracle relative to which BQP⊄AM.
  • The analogous generalization of the full Linial-Nisan Conjecture would imply that Fourier Checking is not in PH.  And thus, we’d get our long-sought oracle relative to which BQP⊄PH.

After realizing the above, I tried for months to prove the requisite generalization of Bazzi’s Theorem—or better yet, get someone else to prove it for me.  But I failed.  All I managed to do was to goad Razborov into proving his amazing 2-page version of Bazzi’s original theorem, which in turn inspired Braverman to shoot for the full Linial-Nisan Conjecture.

In what appears to be a cosmic prank, about the only conjectures in this area that still haven’t been proved are the ones I needed for the quantum computing problem.  And thus, I will offer $100 for a proof that Fourier Checking is not in AM, $200 for a proof that it’s not in PH.  In so doing, my hope is to make Tuesday, January 20, 2009 remembered by all as the day our economy finally got back on track.

30 Responses to “The arc of complexity is long, but it bends toward lower bounds”

  1. Scott Says:

    You see, people? Finally I post about something serious, and everyone just goes on debating Mac vs. Linux vs. PC. Never wonder again why this blog concentrates so much on the trivial, the ephemeral, the procrastinacious… 🙂

  2. rw Says:

    Scott,

    What is the link between Kolmogorov complexity (especially the Minimum Description Length principle) and space and time complexity? They seem to be aspects of the same thing, to me. Thanks!

  3. John Armstrong Says:

    We knew Mark was headed for something big when he started at Yale. I think his transfer to Toronto was for the best, given that he wouldn’t have had much of anyone to spark this sort of work in our math department. Not that I knew of, anyway.

  4. Scott Says:

    rw: They’re related but different. The Kolmogorov complexity K(x) is the length of the shortest program that outputs a string x. A key point about K(x) is that it ignores the running time of the program (as well as the space and so on)—all it cares about is the program’s length. Thus, Kolmogorov complexity is really part of computability theory (and maybe information theory), not complexity theory.

    The other point people like to stress about K(x) is that it’s defined for individual strings x (up to an additive constant). By contrast, in complexity theory, usually we only care about asymptotic behavior (e.g. time and space usage) as the input size increases. But this difference is not really essential. And indeed, complexity theorists do study a measure that’s similar in spirit to Kolmogorov complexity: namely the circuit complexity C(f), which is the size of the smallest circuit that computes a Boolean function f. Just like the Kolmogorov complexity, the circuit complexity is large for almost all functions, but small for most of the functions we actually care about.

    The difference between the two, again, boils down to the fact that in circuit complexity we care about the cost of computing an object, whereas in Kolmogorov complexity, we only care about the cost of specifying it. And so for example, there exist Boolean functions with tiny Kolmogorov complexities (since they have small specifications), but huge circuit complexities (since they’re hard to compute). On the other hand, if the circuit complexity is small, then the Kolmogorov complexity must be small also.

    A final twist is that you can study the time-bounded Kolmogorov complexity: that is, the length of the shortest program that outputs a given string x and halts in a “reasonable” amount of time. But as shouldn’t be too surprising, it turns out that time-bounded Kolmogorov complexity is roughly equivalent to circuit complexity! (Which, in turn, is the “nonuniform” version of time complexity.)

  5. rw Says:

    Scott, thank you for that explanation! The link between K(x) and circuit complexity was something with which I had not been previously aware. This semester am doing a reading course, with a graph theorist, on complexity, and I asked him the question I asked you. He mentioned the computability theory vs. complexity theory aspect, as well. Do you think these fields are reconcilable? Isn’t “information” just “information” no matter how you look at it? Am I silly for wanting to know? Can this post be made any shorter? Thanks!

  6. Scott Says:

    rw, to talk about “reconciling” computability theory and complexity theory suggests there’s some sort of conflict between them! There isn’t. They just study computation from different angles, one concerned with efficient use of resources and the other not. It’s like how ornithology and culinary science both study the chicken, but they ask different questions, and the answers they get are presumably compatible.

  7. Benjamin Says:

    Do you think this Fourier Checking problem is more likely / easier to be shown to lie outside PH than Recursive Fourier Sampling?

  8. Scott Says:

    Benjamin, Fourier Checking should be much easier to analyze than RFS. And the separation you get should be exponential, not just quasipolynomial.

  9. elad Says:

    Scott: what do you mean by “local almost-randomness”? You might want to note that Braverman states that his results even work when the distribution is almost k-wise independent. Is that enough?

  10. John Sidles Says:

    Scott says: … the circuit complexity is large for almost all functions, but small for most of the functions we actually care about. …

    That short sentence has remarkable depth. For example, if we change “circuit complexity” to “simulation complexity” and “function” to “dynamical system”—which if you think about it, can be regarded as no change at all—then the statement remains true.

    And this explains why we systems engineers (especially quantum systems engineers) are so very interested in modern complexity theory: the practical applications—or as Scott puts it, “dynamical systems we actually care about”—have sufficient scope as to offer substantial promise of helping (again in Scott’s phrase) “our economy finally get back on track.”

    Not least, in creating new jobs, and exciting new mathematical challenges, for the next generation of complexity theorists! 🙂

  11. Scott Says:

    elad: It’s not enough. With that notion of “almost,” you can only handle 1/npolylog(n) additive error, which isn’t good enough for me.

    What I can guarantee about my distribution D is the following: for all conjunctions C of no(1) literals,

    1-ε ≤ PrD[C] / (1/2)|C| ≤ 1+ε,

    where ε=no(1). (Indeed, if it helps, I can push |C| up to n1/4 and ε down to 1/n1/4.)

    So, I need to handle much “larger” deviations from uniformity, but the advantage is I can guarantee those deviations are multiplicative rather than additive.

  12. anon Says:

    Win on the MLK reference in the title

  13. Job Says:

    In the Fourier Checking problem, is there always a set of polynomially many x_i such that, by checking the f(x_i) and g(x_i) pairs, it’s likeliest that f and g were generated by method 2?

  14. Scott Says:

    Job: Excellent question. The main thing I know how to prove is that no such set exists, of any subexponential size. Indeed, it’s on that basis that I conjecture the problem is not in PH.

  15. Stephen Harris Says:

    http://www.santafe.edu/research/publications/workingpapers/03-12-068.pdf
    Effective Complexity, Murray Gell-Mann Seth Lloyd

    “In addition to logical depth, we can utilize a quantity that is,
    in a sense, inverse to it, namely Bennett’s “crypticity,” [2]
    which is, in rough terms, the time necessary to go from the
    description of an entity to a short program that yields that
    description. As an example of a situation where crypticity is
    important, consider a discussion of pseudorandomness. These
    days, when random numbers are called for in a calculation, one
    often uses instead a random-looking sequence of numbers produced
    by a deterministic process. Such a pseudorandom sequence
    typically has a great deal of crypticity. A lengthy investigation
    of the sequence could reveal its deterministic nature and, if it
    is generated by a short program, could correctly assign to it a
    very low AIC. Given only a modest time, however, we could fail to
    identify the sequence as one generated by a simple deterministic
    process and mistake it for a truly random sequence with a high
    value of AIC.”
    ———————-
    “A measure that corresponds much better to what is usually meant
    by complexity in ordinary conversation, as well as in scientific
    discourse, refers not to the length of the most concise description
    of an entity (which is roughly what AIC is), but to the length of
    a concise description of a set of the entity’s regularities. Thus
    something almost entirely random, with practically no regularities,
    would have effective complexity near zero. So would something
    completely regular, such as a bit string consisting entirely of
    zeroes. Effective complexity can be high only a region intermediate
    between total order and complete disorder.”
    [AIC = Algorithmic Information Content ->Kolmogorov Complexity

    Mark Braverman, Stephen Cook
    “When we discuss the computational complexity of problems, we are
    not only interested in whether a function can be computed or not,
    but also in the time it would take to compute a function.”

  16. John Sidles Says:

    Stephen Harris’ post comes reasonably near to asking questions of great practical interest, like the all-too commonly encountered: “Which task has the lesser complexity: debugging this badly written code, versus writing fresh code from scratch?”

    And yet it’s not so obvious (to me anyway) how best to formalize this class of question, which calls to mind Pauli’s famous blank rectangle, labeled “This is to show I can paint like Titian. Only details are missing.” What creatures from the Complexity Zoo are lurking nearby, formally speaking?

    … and I too join the chorus of praise for that MLK quote!

  17. Gil Kalai Says:

    Here is a problem (perhaps silly) that comes to mind: Is the conclusion of the Linial-Nisan conjecture (=Braverman’s result) holds for monotone threshold circuits, namely for bounded depth circuits with (monotone) threshold gates?

  18. Bram Cohen Says:

    What is the best known upper bound? That is, what’s the ‘easiest’ locally random but globally not function anyone has been able to construct, and what depth of circuit is necessary to distinguish it from a truly random function?

  19. Scott Says:

    Gil: Interesting question. I don’t know. What do we know, in general, about bounded-depth monotone threshold circuits?

  20. Louis Wasserman Says:

    We know depth (k-1) monotone threshold circuits can require exponential size to compute functions computable by linear-size, depth-k monotone circuits (Hastad, Goldmann 91), which is about the only result I can find.

    Essentially, Gil, you’re asking if Braverman’s result holds for mTC^0, which I’m almost positive does not trivially follow. (I’d certainly be curious to see whether or not extending the result to mTC^0 would be significantly easier than for TC^0, and why!) I haven’t (yet) convinced myself whether or not it’s even likely to be true.

  21. hk Says:

    Is it clear that a similar result doesn’t hold for threshold circuits of depth 1 or 2 (with negations)?

    AC^0, mTC^0 and TC^0 depth 2 are classes we have exponential lower bounds for.

  22. Gil Says:

    As Luca pointed out we cannot hope for such a strong result for mTC0. Maybe a much weaker result (regarding what “fools” means) would hold.

    A conjecture about mTC0 of Benjamini Schramm and me asserts that the Fourier coefficients decay above level (log size)^O(depth). The decay is weak (not exponential as the result of Linial-Mansour-Nisan), even for majority.

    I would hesitate to make such a conjecture for monotone functions in TC0 because of the dark clouds of Rudish-Razborov natural proofs type theorems. (If I remember correctly it is Naor-Reingold for TC0 but I am not sure.) But I do not know a counterexample. (In fact, I am not aware of a monotone function in TC0 not in mTC0 but I am sure there are such functions; the analogous question for AC0 is dealt with in a paper by Ajtai-Gurevich if I am not wrong.)

  23. Scott Says:

    Bram: Sorry your interesting question got stuck in the queue!

    A simple example of a locally-random-but-globally-nonrandom distribution is the uniform distribution over codewords of a linear error-correcting code: that is, for some “good” generator matrix A∈Z2n×m, the uniform distribution over Av where v is drawn uniformly from Z2m. If you want a polylog-wise independent distribution, it’s known that it suffices to take m=polylog(n). And in that case, a constant-depth polynomial-size circuit will be enough to distinguish the distribution from uniform. Why? Because the XOR of some polylog(n) bits will already be enough to distinguish the distribution from uniform (by a rank argument), and it’s known that for constant d, the XOR of k bits can be computed by a circuit of depth d and size ~2k^O(1/d). Plugging in k=polylog(n), we find that a const-depth poly-size circuit suffices.

    So in Braverman’s proof, you can really only hope for a lower bound against constant-depth circuits. And as it turns out, looking at his proof, if your distribution is (log n)c-wise independent, then you get a lower bound against depth ~O(√c) circuits. This is essentially tight, except that possibly the depth could be improved from O(√c) to O(c) (or maybe not—but I don’t know of any obstruction to this).

  24. elad Says:

    Scott: Thanks for the info! These days I’m thinking about learning over non-uniform distributions (COLT deadline coming up), and the class of distributions that you described is actually one of the classes I’m thinking of.

    (One easy result: it is almost trivial to learn DNF with poly(n) clauses to high accuracy in time n^polylog, over any distribution D where for all conjunctions C of >=polylog(n) literals, Pr_D[C]

  25. Bram Cohen Says:

    Scott, related to all that stuff (I *think* I follow it) there’s an ‘obvious’ fact that to compute the xor of all the input bits using and, or, and not gates of unbounded fan-in requires log(n) depth. Is that result an early step in this proof, using the polynomial method?

    Also, any idea what happens if you allow xor or majority gates of unbounded fan-in? Does that make all locally random functions distinguishable from truly random ones in constant depth, or only some of them?

    Come to think of it, the circuit which has a bunch of outputs which are all MAJORITY of some assignment of the inputs is an interesting one. I wonder if the number of gates needed for calculating those is number of inputs plus the number of outputs or number of inputs times number of outputs.

  26. Scott Says:

    Bram, the ‘obvious’ fact you allude to (Parity∉AC0) is of course one of the great achievements in the history of complexity theory!

    If you want to compute XOR by a polynomial-size circuit with AND, OR, and NOT gates, I think depth ~log(n)/loglog(n) is necessary and sufficient. And yes, Braverman’s proof contains both the Linial-Mansour-Nisan theorem and the Razborov-Smolensky theorem as pieces, and both of those pieces imply the result about PARITY requiring ~log(n)/loglog(n) depth.

    As for your second question … the trouble is that the uniform distribution is itself a “locally-random distribution”, and that one’s certainly hard to distinguish from the uniform distribution even with XOR and MAJORITY gates! 🙂 But I assume you meant something like locally-random distributions with small support or low min-entropy. Using a probabilistic argument, it’s not hard to show that there exist such distributions that are indistinguishable from the uniform distribution, even by arbitrary polynomial-size circuits (let alone constant-depth circuits with XOR or MAJORITY gates)! Under a complexity assumption (e.g., that strong one-way functions exist), such distributions can even be explicit and sampled from efficiently.

    Now, if you want an explicit, unconditional locally-random distribution with small min-entropy, which can’t be distinguished from the uniform distribution by constant-depth polynomial-size circuits with MAJORITY gates … well, then you’re essentially asking for an explicit circuit lower bound against TC0 circuits, which does not yet exist. On the other hand, if you only care about XOR gates (rather than MAJORITY gates), then I think you can construct such a distribution based (for example) on the “Hamming weight mod 3” function. We know from Razborov-Smolensky that the Hamming weight mod 3 is not computable by constant-depth polynomial-size circuits with AND, OR, and XOR gates. And you should be able to plug that lower bound into (say) a Nisan-Wigderson-type pseudorandom generator to get the distribution you want.

  27. harrison Says:

    Per Gil’s question: I’m by no means an expert in circuit complexity, but even if the result is true for (m)TC0, I’d think you’d need tougher machinery to prove it (indeed, unless I’m horribly mistaken, the reason TC0 is a bigger complexity class — and thus the question is nontrivial — is that threshold gates aren’t approximable by low-degree polynomials!)

    I confess that I have no real intuition for Fourier analysis, though, so I might be missing something in the conjecture about the coefficients of mTC0; if Gil or anyone else could elaborate, that’d be fantastic!

  28. Arnab Says:

    A question about further improvements on the Braverman result. What is the conjectured amount of independence needed to fool AC0 functions? I understand that for d=2, there is a result by Luby and Velickovic that rules out the possibility of log m independence being enough. For d>2, is there an example showing that the exponent on the log needs to grow with d? No, right? Is there anything better than log m known for larger constant d?

    Also, using random functions as pseudo-random generators, what is the dependence of the seed length on the depth of the AC0 circuit being fooled?

  29. Scott Says:

    Arnab: See my previous message to Bram Cohen. Basically, the natural conjecture would be that (log m)O(d)-wise independence suffices to fool depth-d circuits, while Braverman’s result shows that (log m)O(d^2) suffices. We know that (log m)O(d) is necessary because a poly(m)-size, depth-(d+O(1)) circuit can compute PARITY on (log m)d bits.

  30. Arnab Says:

    Thanks! I see. So, the conjectured seed length is similar to the seed length for the unconditional Nisan PRG. Perhaps, there is some stronger notion of local randomness which gives a shorter seed length? I guess this is not very optimistic, given the lack of progress on this question since Nisan.

    Also, regarding Bram Cohen’s question above about fooling circuits with MAJORITY gates, there is also the paper by Viola which gives a PRG with seed length 2^{O(sqrt(log m))} to fool constant depth circuits of size m with at most log m MAJORITY gates. (Compare this to the polylog(m) seed length given by the Nisan PRG as well as Braverman’s result for constant depth circuits without MAJORITY gates).