A recursive descent parser in #forthlang (and a nod to #rakulang)

I was reading through Schorre’s paper on Metta II again (see also the Wikipedia article).  For those that don’t know, Meta II is a metacompiler, which is to say, a language that can compile itself. It can’t, of course, be turtles all the way down, otherwise you wouldn’t be able to bootstrap it.

So it compiles down to an idealised assembly language whose instructions have to be implemented in some other language. I know that there is at least one C and lua implementation, for example. No doubt the Raku folks out there can’t permit this to go unchallenged given its grammar functionality.

A criticism of Meta II is that a fair bit of work has to go into writing a virtual machine. It’s not an excessive amount of work, though.

One weakness of Meta II, I think, is that you can’t perform arbitrary actions, only transform input to output. To give an example, Meta II might produce an instruction like:

    TST ' .OUT'

which tests the input string against the given string (in this ‘ .OUT’). This gives us a problem: we cannot, for example, create assembly code output from Meta II, as assemblers don’t allow for the inlining of strings. They need to be collected and put in a separate area. Meta II has no mechanism that allows you to do that.

Raku grammars allow you to perform actions as it parses, but often they aren’t what you want. Take, for example, a rule that might look something like

rule my-if { 
    'IF' <expr> 'THEN' <statements> 'END' 
    { do-something } 
}

The problem is that <expr> and <statements> are processed before do-something, which is not what you want. You want to be able to “mix in” processing at different levels of parsing.

The solution in Raku’s case is either to bolt on an action class to the parsed tree, or parse the resultant tree manually yourself. The former approach is the standard one, but I think that the latter approach might be easier in a lot of circumstances.

It got me to thinking as to how Forth, a David compared to Raku’s Goliath, might compete in this arena.

As a small diversion, I tried to see if I could put together a recursive descent parser for arithmetic involving just numbers, parentheses, addition and multiplication. Here’s what I whipped up using gforth:

create yytext 80 allot
variable yylen
: .yytext yytext yylen @ type ; \ print yytext
: yylex parse-name yylen ! yytext yylen @ move ; \ scan next token
: yy0 yytext c@ ;
: IS? yy0 = ; \ ( c -- t ) is first char of lexeme c?
: NUM? yy0 '0 >= yy0 '9 <= and ;

\ NOW WE DEFINE OUR ACTUAL RULES

defer ex1

: BRA yylex ex1 yylex ;
: EX3 num? if ." LDA " .yytext cr yylex else BRA then ;
: EX2 EX3 begin '* is? while yylex EX3 ." MUL" cr repeat ;
: EX1' EX2 begin '+ is? while yylex EX2 ." ADD" cr repeat ;

' ex1' IS ex1

: RUN refill drop yylex ex1 cr .s bye ;
run
1 + ( 5 + ( 2 + 3 * 4 ) * 6 )

Running it produces the following code:

LDA 1
LDA 5
LDA 2
LDA 3
LDA 4
MUL
ADD
LDA 6
MUL
ADD
ADD

Now, there are severe limitations to the approach, and there’s a little bit of “cheating” going on, with some simplifying assumptions, but I was gratified with just how little code I needed to produce the desired result.

For comparison purposes, the equivalent rules in Metta II are:

EX3 = .NUMBER .OUT('LD ' *) / '(' EX1 ')' .,EX2 = EX3 $ ('*' EX3 .OUT('MUL')) .,EX1 = EX2 $ ('+' EX2 .OUT('ADD')) .,

Part of the cheating I adopted involves using Forth’s own parser. How could this be improved?

There are a few approaches I can think of.

One might be to implement the idealised assembly language exactly as it is stated by Schorre. The main problem as I see it is with the problem of branching instructions. It’s difficult to visualise the nesting structure of the outputted branches, but if the forward branches are strictly nested, then we should be able to cope  .

A second approach might be that instead of writing a Forth that parses the idealised code, make Meta II output valid Forth code.  We then don’t have to write anything. We just run the code.

A third might be to write a “messy Meta II” parser, in the style of my Forth solution above, which takes in a clean Metta II description and produces a messy one.

A fourth solution might be to write the grammars in a more Forth-friendly style, with something like:

    : EX1 EX2 $ ( '+'EX2 => ."ADD" ) ;

You’d then define $ as an immediate word that scans the next token. ‘(‘ would also be an immediate word that does its own scanning up to and include the ‘)’. I’m sure there’d be many details to work out.

The coolness of all this is that it would then be “Forth all the way down” (well, maybe a couple of levels deep, but you wouldn’t need to implement an assembler). And you’d be able to perform any arbitrary actions you liked.

Interesting, no? I have to say, there’s something just way intriguing about Forth itself and how it can effectively reconfigure itself. Perhaps one can achieve similar expressiveness with Raku’s “slang” capabilities?

Maybe one could write a Forth as a Raku slang, and use that slang to create other slangs? Come on Rakuers, you know you want to 😉

For more mind-blowing stuff, check out the page on creating a BNF parser in Forth. I haven’t studied it yet, but it looks interesting. I am constantly amazed as to how experienced Forthers manage to achieve so much with so little.

Your Forth program isn’t finished until it looks like a Haiku. 😉

Postscript: Greetings Bulgaria! Looking at the analytics for this blog, I notice that an overwhelming number of readers today are from Bulgaria. I have no idea why they should find my blog interesting all of a sudden, but welcome anyway.

About mcturra2000

Computer programmer living in Scotland.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to A recursive descent parser in #forthlang (and a nod to #rakulang)

  1. Pingback: 2020.03 Trait::Traced – Weekly Rakudo News

  2. Pingback: A recursive descent parser in #forthlang (and a nod to #rakulang) - Dhruba's Blog & Online News

  3. Jeff Goff says:

    As the author of a slew of parsers in Raku (including Perl6::Parser, ANTLR::Grammar, {Regexp,Grammar}::Common, Emojicode and a few Inline modules) I’m tempted to say “Challenge accepted, sir!” but it’s currently stuck behind finishing the Perl -> Raku translator.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s