Intuiting Pratt Parsing

(louis.co.nz)

48 points | by signa11 2 days ago

5 comments

logdahl 1 hour ago
Love Pratt parsing! Not a compiler guy, but I've spent way too many hours reflecting on parsing. I remember trying to get though the dragon book so many times and reading all about formal grammar etc. Until I landed on; recursive descent parsing + Pratt for expressions. Super simple technique, and for me is sufficient. I'm sure it doesn't cover all cases, but just for toy languages it feels like we can usually do everything with 2-token lookahead.
Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)
[-]
- eru 21 minutes ago
  The Dragon book is not very good, to be honest.
  It was probably decent when all you had was something like Pascal and you wanted to write a C compiler.
  Parsing and compiling and interpreting etc are all much more at home in functional languages. Much easier to understand there. And once you do, then you can translate back into imperative.
  For parsing: by default you should be using parser combinators.
- randomNumber7 1 hour ago
  It's not for toy languages. Most big compilers use recursive descent parsing.
- signa11 52 minutes ago
  > Not to step on anyone's toes, I just don't feel that formal grammar theory is that important in practice. :^)
  exactly this ! a thousand times this !
  [-]
  - ogogmad 46 minutes ago
    I think even the theory of Regular Languages is overdone: You can get the essence of what NFAs are without really needing NFAs. You can get O(n) string matching without formally implementing NFAs, or using any other formal model like regex-derivatives. In fact, thinking in terms of NFAs makes it harder to see how to implement negation (or "complement" if you prefer to call it that) efficiently. It's still only linear time!
    The need for NFA/DFA/derivative models is mostly unnecessary because ultimately, REG is just DSPACE(O(1)). That's it. Thinking in any other way is confusing the map with the territory. Furthermore, REG is extremely robust, because we also have REG = DSPACE(o(log log n)) = NSPACE(o(log log n)) = 1-DSPACE(o(log n)). For help with the notation, see here: https://en.wikipedia.org/wiki/DSPACE
- gignico 1 hour ago
  Until you need to do more than all-or-nothing parsing :) see tree-sitter for example, or any other efficient LSP implementation of incremental parsing.
- ogogmad 1 hour ago
  Quick other one: To parse infix expressions, every time you see "x·y | (z | w)", find the operator of least binding power: In my example, I've given "|" less binding power than "·". Anyway, this visually breaks the expression into two halves: "x·y" and "(z | w)". Recursively parse those two subexpressions. Essentially, that's it.
  The symbols "·" and "|" don't mean anything - I've chosen them to be visually intuitive: The "|" is supposed to look like a physical divider. Also, bracketed expressions "(...)" or "{...}" should be parsed first.
  Wikipedia mentions that a variant of this got used in FORTRAN I. You could also speed up my naive O(n^2) approach by using Cartesian trees, which you can build using something suspiciously resembling precedence climbing.
svat 32 minutes ago
> I’ve read many articles on the same topic but never found it presented this way - hopefully N + 1 is of help to someone.
Can confirm; yes it was helpful! I've never thought seriously about parsing and I've read occasionally (casually) about Pratt parsing, but this is the first time it seemed like an intuitive idea I'll remember.
(Then I confused myself by following some references and remembering the term "precedence climbing" and reading e.g. https://www.engr.mun.ca/~theo/Misc/pratt_parsing.htm by the person who coined that term, but nevermind — the original post here has still given me an idea I think I'll remember.)
randomNumber7 1 hour ago
I can recommend anyone reading pratts original paper. Its written in a very cool and badass style.
https://dl.acm.org/doi/epdf/10.1145/512927.512931
hyperhello 45 minutes ago
You can either use the stack in an intuitive way, or you can change the tree directly in a somewhat less intuitive way without recursion. Essentially either DF or BF. I don’t see how it matters much anymore with stacks that grow automatically, but it’s good to understand.
priceishere 1 hour ago
An even simpler way imo, is explicit functions instead of a precedence table, then the code pretty much has the same structure as EBNF.
Need to parse * before +? Begin at add, have it call parse_mul for its left and right sides, and so on.
```
  parse_mul() {
    left = parse_literal()
    while(is_mul_token()) { // left associative
      right = parse_literal()
      make_mul_node(left, right)
    }
  }

  parse_add() {
    left = parse_mul()
    while(is_add_token()) { // left associative
      right = parse_mul()
      make_add_node(left, right)
    }
  }
```
Then just add more functions as you climb up the precedence levels.
[-]
- kryptiskt 1 hour ago
  You lose in versatility, then you can't add user-defined operators, which is pretty easy with a Pratt parser.