How to train your program verifier

(risemsr.github.io)

34 points | by matt_d 4 days ago

4 comments

  • woodruffw 2 hours ago
    At a very quick look, no evidence is given that the "bugs" found in requests are in fact reachable, i.e. not prevented by construction. And sure enough, the very first one is impossible because of a validating guard[1]: `address_in_network` only gets called after `is_valid_cidr`, which enforces the presence of a slash.

    I think we should hold claims about effective static analysis and/or program verification to a higher standard than this.

    [1]: https://github.com/psf/requests/blob/4bd79e397304d46dfccd76f...

    • JimDabell 57 minutes ago
      > the very first one is impossible because of a validating guard[1]: `address_in_network` only gets called after `is_valid_cidr`, which enforces the presence of a slash.

      It’s correct to flag this code. The check is performed manually outside of the function in question. If you call the function directly, the bug surfaces.

      There is no mention in the function documentation of the validation requirement, making it easy to call incorrectly. Also, if it is required to call the validator before calling this function, then the function could just call it itself.

      In short, it’s possible to make this code safe by definition, but instead it relies upon the developer to always make the undocumented right choices every single time it is called. I would expect something more rigorous from verified code.

    • seanmcdirmid 1 hour ago
      Most (all?) static analyzers are conservative, and reducing your false positive rate is always a struggle. You should never expect a false positive rate of zero (it’s probably impossible to not have false positives), but you shouldn’t be presenting your false positives as successes either.
      • woodruffw 1 hour ago
        Sure, but this one doesn’t pass the sniff test. I’ve written plenty of static analysis tools (including ones that do symbolic execution), and one of the first things you do to ensure that your results are valid is create some model of tainting/reachability. Even an analysis that’s 1-callsite sensitive would have caught this and discarded it as a false positive.

        (In case it isn’t clear, I’m saying this is slop that someone whipped up and didn’t even bother to spot check.)

  • grey-area 4 minutes ago
    I miss the days when humans submitted things they had done to this site, instead of generating long slop articles in 5 minutes: ‘LLM‑based code synthesis—while mind-numbingly effective—’ about slop code they generated in 5 minutes (or worse in hours) with foolish prompts:’Produce mathematics at the level of Vladimir Voevodsky, Fields Medal-winning, foundation-shaking work’.

    Should we even read this or should we get an LLM to summarise it onto a few bullet points again?

    This bit was interesting in illuminating the human authors’ credulity (assuming they believe in their own article):

    ‘The central move was elegant: stop asking only “is the system safe?“, start asking “how far is it from safety?“‘

    This ersatz profundity couched in a false opposition is common in generated text - does it have anything at all to do with the code generated or is it all just convincing bullshit?

  • saithound 1 hour ago
    What if you asked your favorite AI agent to produce mathematics at the level of Vladimir Voevodsky, Fields Medal-winning, foundation-shaking work but directed toward something the legendary Nikolaj Bjørner (co-creator of Z3) could actually use?

    Well, you'd get this embarrassing mess, apparently.

  • naillang 1 hour ago
    [dead]