A far-off dream: automating a problem in P

In a comment on my last post, Bram Cohen writes:

This whole business of ‘formality’ and ‘review’ is really kind of dumb. A mathematical theorem is only really proven when a computer can verify the proof. Until then, it’s just hand-waving which has some degree of utility when generating a real proof.

Were it standard to present proofs in computer-checkable form, there would be no review process at all. In fact it would be possible to send a proof to a theorem server which would automatically accept any proof which checked out. Had Perelman submitted to one of those, we wouldn’t have had any review process at all, and had complete confidence from day 1, and there wouldn’t be any of this stupid game of who really proved it by making the arguments sufficiently ‘formal’ or ‘detailed’.

I view the switch to doing mathematics in the style just described as inevitable…

Like Bram, I also hope and expect that mathematicians will eventually switch to machine-readable proofs supplemented by human-readable explanations. That would certainly beat the current standard, proofs that are readable by neither machines nor humans.

So then why hasn’t it happened already? Probably the best way to answer this question is to show you the proof, in a state-of-the-art formal verification system called HOL Light, that the square root of 2 is irrational.

let rational = new_definition
`rational(r) = ?p q. ~(q = 0) / abs(r) = &p / &q`;;

let NSQRT_2 = prove
(`!p q. p * p = 2 * q * q ==> q = 0`,
MATCH_MP_TAC num_WF THEN REWRITE_TAC[RIGHT_IMP_FORALL_THM] THEN
REPEAT STRIP_TAC THEN FIRST_ASSUM(MP_TAC o AP_TERM `EVEN`) THEN
REWRITE_TAC[EVEN_MULT; ARITH] THEN REWRITE_TAC[EVEN_EXISTS] THEN
DISCH_THEN(X_CHOOSE_THEN `m:num` SUBST_ALL_TAC) THEN
FIRST_X_ASSUM(MP_TAC o SPECL [`q:num`; `m:num`]) THEN
POP_ASSUM MP_TAC THEN CONV_TAC SOS_RULE);;

let SQRT_2_IRRATIONAL = prove
(`~rational(sqrt(&2))`,
SIMP_TAC[rational; real_abs; SQRT_POS_LE; REAL_POS; NOT_EXISTS_THM] THEN
REPEAT GEN_TAC THEN DISCH_THEN(CONJUNCTS_THEN2 ASSUME_TAC MP_TAC) THEN
DISCH_THEN(MP_TAC o AP_TERM `x. x pow 2`) THEN
ASM_SIMP_TAC[SQRT_POW_2; REAL_POS; REAL_POW_DIV; REAL_POW_2; REAL_LT_SQUARE;
REAL_OF_NUM_EQ; REAL_EQ_RDIV_EQ] THEN
ASM_MESON_TAC[NSQRT_2; REAL_OF_NUM_EQ; REAL_OF_NUM_MUL]);;

Cool — now let’s do Fermat and Poincaré! Any volunteers?

Seriously, the biggest accomplishments to date have included formal proofs of the Jordan Curve Theorem (75,000 lines) and the Prime Number Theorem (30,000 lines). If you want to know which other famous theorems have been formalized, check out this excellent page. Or look at these notes by Harvey Friedman, which cut through the crap and tell us exactly where things stand.

A huge part of the problem in this field seems to be that there’s neither a standard proof format nor a standard proof repository — no TeX or HTML, no arXiv or Wikipedia. Besides HOL Light, there’s also ProofPower, Isabelle, Coq, Mizar, and several other competitors. I’d probably go with Mizar, simply because the proofs in it look the most to me like actual math.

Friedman gives machine-readable proofs fifty years to catch on among “real” mathematicians. That seems about right — though the time could be reduced if the Don Knuth, Tim Berners-Lee, Paul Ginsparg, or Jimmy Wales of proof-checking were to appear between now and then. As usual, it mostly comes down to humans.

Update: Freek Wiedijk put together a fantastic online book, which shows the proofs that the square root of 2 is irrational in 17 different formal systems. The “QED Manifesto” is also worth a look. This manifesto makes it clear that there are people in the formal verification world with a broad enough vision — if you like, the Ted Nelsons of proof-checking. Nelson is the guy who dreamed in 1960 of creating a global hypertext network. In his case, it took 35 years for the dream to turn into software and protocols that people actually wanted to use (not that Nelson himself is at all happy with the result). How long will it take in the case of proof-checking?

23 Responses to “A far-off dream: automating a problem in P”

  1. Anonymous Says:

    Looks like someone has put this on their “to do” list. 🙂

    See:

    http://jorendorff.blogspot.com/2006/03/metamath.html

    http://jorendorff.blogspot.com/2006/04/to-do.html

  2. Mahdi Says:

    I think the existence of computer-checkable proofs does not eliminate the review process at all. In the case of Poincare, a human still has to read the proof to verify what’s being proved is indeed the Poincare conjecture. This is not an obvious task at all as long as these formal languages remain cryptic, and could be indeed as challenging as checking the human-readable version! In case of famous theorems, lists of theorems can be compiled with the statement of theorems already encoded in various formal languages, but for the rest, there is nothing much to do unless we start thinking ourselves in “HOL Light” rather than math, to make sure that everything is in machine language in the first place!

  3. Scott Says:

    In the case of Poincare, a human still has to read the proof to verify what’s being proved is indeed the Poincare conjecture. This is not an obvious task at all as long as these formal languages remain cryptic…

    This is true. But given the choice, I’d much rather verify that a formal statement of Poincare matched the informal statement, than verify Perelman’s proof.

  4. Greg Kuperberg Says:

    I’d probably go with Mizar

    It’s nice to hear that! By an amazing coincidence, Mizar was invented by my uncle.

  5. Greg Kuperberg Says:

    To turn to another side of this post:

    Even if reasonable formal proof-checking existed, you should expect some division of labor. The crowning recent achievement of proof-checking is the formal proof of the Four-Color Theorem due to Georges Gonthier. But the truth is that the Four-Color Theorem was proved in buggy form by Appel and Haken, and then the proof was cleaned up (and simplified) to something reliable by Robertson, Sanders, Seymour, and Thomas.

    Division of labor in turn opens to the door to disputes over credit. Disputes over credit bring out everyone’s personality flaws. For instance if Georges Gonthier were much less reasonable than he actually is, he could claim that he actually proved the Four-Color Theorem. So while proof-checking would be extremely useful for many purposes, I don’t think that you can expect it to eliminate credit disputes.

  6. Osias Says:

    > let rational = new_definition
    > `rational(r) = ?p q. ~(q = 0) /
    > abs(r) = &p / &q`;;

    It looks to me like a easily reusable definition. In several other proofs. Formal proofs won’t need to be soooo long if using definition libraries. (or lemma libraries, etc)

  7. Daniel Says:

    Even if a formal machine-readable version of a proof is published, there is still a need for a human-readable version. One of the purposes of a paper is to let other researchers learn something. If you publish only a formal, hard to read proof, then your collegues cannot admire your brilliant ideas, unless they decode thousands of lines of arcane code. So probably the formal proof should be only supplement of the human-readable paper. In this case, reviewing cannot be completely eliminated: although it is clear that the result is correct, it should be checked whether the human-readable proof is really understandable and it really describes what is going on in the formal proof.

  8. Scott Says:

    By an amazing coincidence, Mizar was invented by my uncle.

    That strikes me less as an amazing coincidence than as evidence for the heritability of certain traits. 🙂

  9. David Molnar Says:

    Blanchet and Pointcheval have a new tool that does automatically checkable proofs in cryptography. Aside from the check, it can also output latex with a short explanation of the proof. This works for them because the style of proof they work with is fairly, well, stylized – there are a couple of transformations that are “legal” and you apply such transforms to go from one well-defined proof step to the next.
    http://www.di.ens.fr/~blanchet/cryptoc/FDH/

  10. GASARCH Says:

    1) An interesting article to read on some of
    these issues is
    DeMillo, Perlis, Lipton
    Social Process and Proofs of Theorems and
    Programs

    Its main point is that math is verified
    over time by a social process of people
    looking at it. It tries to argue that
    Proofs of Verification of Programs is
    hence not feasible. While I don’t really
    buy the conclusion, its still a good
    article.

    2) SCOTT- your VERY FIRST BLOG compalined
    about papers needeing
    a version for a conference
    another version for a journal
    another version after referees
    Diff style files and mandatory Formats.

    If we ALSO insist that the proofs be
    verifiable, that will only ADD to
    what you need to do in order to get
    an article published.
    (Maybe Pearlman is productive
    BECAUSE he doesn’t bother with all that.)

    bill g.

  11. Drew Arrowood Says:

    Some people do math for fun. God bless them. Historically, they have been the ones to come up with more math. New math is useful in modeling the world.

    There are those of us who use math. Not in the sense of figuring out how much concrete to order (a cubic yard is a whole lot more than you think it is), but in the sense of checking the justification for a general assertion. By accepting certain postulates, and certain laws of logic, these users of math then become able to say that there model has a certain range of applicability. One can come to know, say, that all eigenvalues emerging from some kind of matrix are real, and hence models using such objects will not fail by giving imaginary results. Of course, such models may fail in other ways, but that is a good thing to check about a model. Physicists and epidemiologists need this kind of math — will this equation be valid in the limit? can results of these two studies be combined?

    We have a serious problem with math education in the United States — we don’t produce enough people that can do the sort of modeling we need to run our highly technological society, and even the people who CAN build and use the models often can’t do the proofs — they can give very little explanation for the grounds of applicability of their models.

    In an introductory logic class, any of us could teach the basic set-theoretic notions upon which scientific models depended, and how a formal verifer worked (I’ve actually done this for the Kochen-Specker theorem with undergraduates). Then, people could talk about what would work and what wouldn’t in terms of relaxing or strengthening the postulate set — the behavior of the paradigma, the model. Though the mathematicians would not think we had taught those people what math was (shades of Wittgenstein at the end of the Tractatus), we would probably be more successful in teaching a person how to recognize a proof than a course that they conducted by traditional methods, and we would probably be more successful in having them generate proofs that the traditional method is (for automated exaustive search would lead to at least some proofs, and we now have many students who can produce no proofs). The test of whether or not such methods would lead to better problem solving is as yet unperformed, however.

    It’s possible that what happened to poetry is going to happen to math. In the world where very few people can read, and paper is so rare that people copy over the last copy of a treatise by Archimedes, the way in which information was retained and passed on was in a code that has enough redundancy so that they person who knows the code can recreate the information. Poetry does this for us. If a line doesn’t “scan”, we know it is wrong. That is how the oral tradition of the Greeks was passed on for many centuries.

    Now, there are people who make it their life’s project to write poetry — who think that the objects their poems refer to are real, I suppose — and certain of them are even imperialists about their form of life who believe that the highest form of life is to write poetry, and the happiness to be achieved from it is higher than any other happiness.

    However, to most of us, this seems false. We feel completely fulfilled chosing a life without the ultimate goal of structuring words in a meaningful, nonprosaic way. We consider someone an adequate member of our society, despite the fact that he or she cannot create poetry, or even “appreciate” it.

    Am I being a formalist about mathematics here? No, I’m not talking about math (something that certain learned Professors can now testify about), but about the constitution of human society, of which the activities of mathematicians are a part.

    These proof formalizations seem overwhelming and awkward now, but consider an incident from the history of computer science. The first compiler took fifty man years to build. The compilers that came after that were used to — write more compilers. The amount of resources in formal proof systems has been significantly less than the resources used to create compilers.

    We solve problems by building models. We know our models will apply to the phenomena in question — that a manipulation of the model is “like” a manipulation (or perception, regrouping of data) of the world in the relevant sense by validation of the mathematical structure of the model. The right approach to what a proof is and how to recognize one will make it possible for people of limited means to think about the models they employ to understand the world.

    Senator John Edwards (who was a 4.0 major in Engineering at NC State) was once asked by George Stephanopoulous who his favorite philosopher was. His reply was, “I don’t like philosophers — read I.F. Stone’s _The Trial of Socrates_”. Philosophers (in Plato’s sense) are those who “know the forms” — and some people just can’t get that far in their understanding.

    Despite the results of Goedel, this sort of work indicates that physics might, after all, end mathematics as a human activity, except in the “ironic” sense. By the use of the word “irony”, I mean something like this — when I try to give my mother-in-love the program I have to solve Suduko puzzles, she has no interest. In working Suduko puzzles, her goal isn’t to complete the puzzles. The intention, the goal, in fact is to do the puzzle. The proof of a mathematical statement on the way to validation of a model does not have as its ultimate goal the human comprehension of the proof — the goal is to validate the model, and that is the way we know how to do this.

  12. Andy D Says:

    Formalizing proofs also makes us aware of the particular axioms and deductive schemas we need (or think we need) to prove a theorem. This can have complexity-theoretic relevance; for instance, Buss and Krajicek showed that PLS is in P iff anything provable using ‘deep induction’:

    (A(0) & (A(x)–>A(x+1))) –>
    A(x) for all x

    is also provable using ‘shallow induction’:

    (A(0) & (A(x)–>(A(2x) & A(2x+1)))) –>
    A(x) for all x

    , within a restricted theory of arithmetic. See
    http://math.ucsd.edu/~sbuss/ResearchWeb/quaderni/index.html

    Also, the quantitative proof strength of (purely finitistic) propositional proof systems like Resolution and Frege is interesting for complexity theory. For all we know, a collection of basic deductive patterns like modus ponens may supply a natural optimal proof system for tautologies, and a proof that NP = coNP.

  13. Arrowood Says:

    The ISO here works on most any recent Pentium IV computer — you reboot with the CD in the drive, and you have a complete theorem proving environment! It’s easy to burn one CD for each student and to put what they need on those CD’s (plus, it gets them to discover a bit about Linux — almost never a bad thing).

  14. Drew Arrowood Says:

    That site can be slow — if anyone wants the ISO, I can put it up on another server.

  15. Scott Says:

    Despite the results of Goedel, this sort of work indicates that physics might, after all, end mathematics as a human activity, except in the “ironic” sense.

    Drew: I agree that if NP-complete problems were solvable efficiently in the physical world, then mathematics as a human activity would end, except in an “ironic” sense. That, to me, is why P vs. NP is one of the most profound questions that humans have ever asked.

    (Indeed, Gödel’s letter to von Neumann shows he fully understood that this, and not the questions he and Turing were able to answer, is really the key question about automating mathematics.)

    But as a practical matter, I wouldn’t worry too much about it. The smart money is on NP-complete problems not being efficiently solvable in the physical universe. 🙂

  16. Scott Says:

    SCOTT- your VERY FIRST BLOG compalined about papers needeing
    a version for a conference
    another version for a journal
    another version after referees
    Diff style files and mandatory Formats.

    I also complained about people who say “blog” when they mean “entry”. You were one of my main inspirations. 🙂

    If we ALSO insist that the proofs be verifiable, that will only ADD to what you need to do in order to get an article published.

    No, I’m imagining that you’d code a formal proof, and then everything else would be intuitive handwaving, of the sort we all find easy and enjoyable when arguing in front of a blackboard.

  17. Anonymous Says:

    Bill Gasarch wrote:
    An interesting article to read on some of these issues is DeMillo, Perlis, Lipton: Social Process and Proofs of Theorems and Programs

    While I don’t really buy the conclusion, its still a good article.

    I agree that it’s an interesting article, and I also don’t buy the conclusion. Although several of its predictions about the future of formal verification of programs have turned out to be incorrect or misplaced (it’s a 1979 CACM paper), its point of view is still stimulating and insightful for other respects.

  18. James Cook Says:

    I’m all in favor of formalizing proofs, but if we think the reason for doing so is to eliminate the review process, we’re missing the point.

    The purpose of a proof is to convince other people of a result–to communicate (a portion of) the prover’s psychology to the minds of the audience. A proof is rigorous exactly to the degree that it is convincing: if we criticize an argument as unrigorous, it is because we are not totally convinced by some aspect of it. As I see it, the principle goal of the activity we call “mathematics” is this psychological one of “understanding”, or becoming “convinced” of things in this sense.

    In other words, mathematics is really “about” the semantics of the formal symbols, rather than their syntax. (This applies even to parts of mathematics whose object of study is syntax itself, for in that case what is of interest is the syntax of the object language, which is described via the semantics of a suitable metalanguage.)

    Thus, for instance, I wouldn’t be “convinced” of the Poincare’ conjecture simply by hearing that it had been formally verified by a computer, or for that matter by hearing that experts are convinced by Perelman’s proof. At most, what these things convince me of is my ability to become convinced of the conjecture after a sufficient amount of work on my part.

    So why would we bother to formalize a proof in such a way that it could be checked by a computer? The reason is not so that we can be sure it is “true”. The reason is that doing so requires us to reduce our concepts down to their philosophical essences, so that we gain a deep introspective understanding of whatever it is we are talking about. (You’ve got to be _very_ clear about what you mean if you want to convince a nonhuman machine!) It seems obvious to me that this was the goal of pre-computer-age logicians such as Russell and Whitehead (and even Frege, despite the fact that he would have taken exception to my view of mathematics).

    So the reason we should formalize proofs is so that philosophically-oriented humans can read them!

  19. Bram Cohen Says:

    Note that the formal proof of the four-color theorem is a major outlier in terms of complexity of proofs which have been formalized. Some new techniques were developed in the process of generating it. Were I to attempt to contribute to the field, learning that proof would clearly be step 1.

    Clearly the field already has its Ted Nelson (in fact, lots of them) but is missing its Jimmy Wales. It’s also missing its Guido van Rossum, which might be a bigger problem.

  20. Slawekk Says:

    Computer-checkable proofs will not eliminate reviews, but surely they will change the way publishing is done in math. Historically, this has been done through publishers who provided certification that the results are correct, new and interesting. Checking the correctness is the most expensive part of this process. As a result the reviewers have to rely a lot on the credibility of the authors. Computer verification will shift this burden from the reviewer to the author and lower the entry barrier to the field.

    >there is still a need for a human-readable version

    If the formal language is well designed it is also human-readable and there is no need for a separate human-readable informal proof description. Mizar is a good step in this direction, but I think Isabelle‘s Isar is much more readable, especially in ZF logical framework (HOL is a bit different, since standard math is based on ZF rather than HOL).

  21. Bram Cohen Says:

    I don’t understand the claim that computer checking won’t get rid of reviewing in math. Currently 99% of what the review process does in math is check correctness. Sure, you can claim that publication helps spread information, but we have the internet which does a much better job of that, or that it helps pick out what results are really notable, although given the total lack of correlation between papers which get awards and papers which get lots of future citations that seems dubious, and again, people can find out about stuff much more effectively using the internet. Or you can claim that there’s more to a math paper than just the proof, but given the current style of math papers that’s just plain not true.

    It’s possible that we’ll have journals (although undoubtedly online ones) in the future which are entirely devoted to math commentary, philosophy, and interpretation. If so, that would be a hilarious 180 of math papers from antiseptic dryness to stoner pontification.

  22. James Cook Says:

    “I don’t understand the claim that computer checking won’t get rid of reviewing in math. Currently 99% of what the review process does in math is check correctness.”

    Some people suggest otherwise: that ordinarily (that is, in “everyday” mathematics, as opposed to “really big” results) referees tend to view their function as determining, for the benefit of nonspecialist editors, whether the results of a paper are “important” enough to merit publication in the journal concerned. (Some mathematicians, such as Paul Halmos, have been known to become irate at the suggestion that referees are “proof checkers”.) Ask yourself what percentage of rejections are actually handed down solely on the grounds that the results of a paper are incorrect.

    But let’s assume you’re right, and that computer verification of proofs would eliminate the formal institution of peer review in mathematics. I claim that the nature of the pursuit known as “mathematics” is such that, in order for it to continue at this stage, papers would still have to be read (or at least their results verified) by actual humans. In other words, mathematicians would still want to be “convinced” of the results, not just of the fact that they could become convinced. Consequently, an informal institution of peer review would inevitably develop, as mathematicians learn to “sort out” the interesting/important papers from among the mass of correct ones.

  23. Bram Cohen Says:

    To clarify, I understand that reviewers do view it as a big part of their job to decide what’s important, and that such judgements factor a lot into what’s accepted, but I don’t think that review process is particularly valuable, if it’s valuable at all.

    Certainly peer review and highlighting the important stuff is valuable, but having randomly selected reviewers anonymously make such decisions is a ridiculous approach to take, with inherently arbitrary results which haven’t been shown to bear much, if any, correlation to future importance. Much more effective would be to simply publish everything, and have notable people link to the notable things from their blogs.