A simple BASIC interpreter in C++

Writing an interpreter is something every programmer wants to have a go at at some point. I wanted to see if I could write a recursive-descent interpreter for the BASIC programming language using modern C++. There is some malice afore-thought, as I am also investigating the use of a programming language for my spreadsheet program neoleo.

I have called my language blang, derived from Basic language. What I like about it is that it is contained in one file, and is reasonably short, at 649 lines. It has if-then-else, for-next, let, print, while-wend and limited support for strings. It has no functions, or gotos.

The code is not perfect, but it shoud be comprehensible, and act as a base for those wishing to explore interpreters.

The idea is that each syntax element has a class with two parts to it:

  • a parser, held in the object’s constructor, that is able to parse the element in question, and dispatch to other elements as required
  • an evaluator, which, obviously enough, evaluates the stored parse expression.

Here is the class declarator for a statement:

class BlangStmt: public BlangCode { // some statement
	public:
		BlangStmt();
		void eval();
	private:
		unique_ptr stmt = nullptr;
};

A top-level statement handles lets, fors, ifs and whiles. Here’s its implementation for the parsing stage:

BlangStmt::BlangStmt() {
	string sym = nextsymb.value;
	nextsymb = yylex();
	if(sym == "print")
		stmt = make_unique();
	else if(sym == "let")
		stmt = make_unique();
	else if(sym == "for")
		stmt = make_unique();
	else if(sym == "if")
		stmt = make_unique();
	else if(sym == "while")
		stmt = make_unique();
	else
		throw runtime_error("Statement unknown token: " + sym);
}

Quite small. Notice what is not handled, though: the else, fi, wend, next statements. That is delegated to another class. Else and fi are handled by the BlangIf class, for example.

The evaluator for BlangStmt is on of the most straightforward evaluators:

void BlangStmt::eval()
{ 
	stmt->eval(); 
}

It just delegates the evaluation to whatever statement it creates. Here is how the if statement is implemented:

class BlangIf: public BlangCode { // if .. then .. [else ..] fi
	public:
		BlangIf() {
			rel_ptr = make_unique();
			checkfor("then");
			bool processing_then = true;
			while(nextsymb.value != "fi") {
				if(nextsymb.value == "else") {
					processing_then = false;
					nextsymb = yylex();
					continue;
				}
				if(processing_then)
					then_stmts.push_back(make_unique());
				else
	 				else_stmts.push_back(make_unique());
			}
			nextsymb = yylex();
		}

		void eval() {
			if(rel_ptr->get_value())
				for(auto& s: then_stmts) s->eval();
			else
				for(auto& s: else_stmts) s->eval();

		}
	private:
		unique_ptr rel_ptr;
		vector then_stmts, else_stmts;
};

You can read the whole source code for how other statements, including relational operators, work.

Because it’s a recursive-descent parser, some classes are defined in terms of other classes in in mutually-recursive fashion. This is sometimes considered “bad form”, but it is just an expression of the underlying algorithms. A little bit of sheparding of class definitions and declarations are required. You need to forward-declare some classes, and sometimes separate the class prototype from its implementation.

I have in mind the idea to completely re-write the code. The parsers contain quite a lot of “grunt work”, compared to their BNF form. Here, for example, is the BNF for parsing numerical expressions:

E --> R {( "<" | "<=" | ">" | ">=" | "==" | "!=" ) R}
R --> T {( "+" | "-" ) T}
T --> P {( "*" | "/" ) P}
P --> v | "(" E ")" | "-" T

It is very simple and readable compared with the code that implements it, relatively speaking. So the plan would be to write a meta-parser: one that could take BNF expressions as a way of transforming the target language into a source tree. The code for the interpreter would be much simplified: the bulk of the code would go in the evaluator.

Except … let’s back up a bit. If we’re going to write another interpreter, then let’s see what the easiest one to implement there is. The answer to that question is: s-exprs.

So, a simplistic Scheme-like interpreter could be made that has injectible functionality. S-exprs are very easy to parse, and you can extend them.

We see, in the example above, the definition of E as being list-like. There are many similarities between the definition of E and R and T. So perhaps, instead of the BNF above, we might write:

(def-form E (repeat-end ( "<" | "<=" | ">" | ">=" | "==" | "!=" ) R)

This may facilitate the evaluator, because it’s a kind of pattern that is common, and can therefore be abstracted. Instead of the evaluator having to chug through a vector of operators and sub-forms, it could define a function that knows how to reduce these vectors, which could be passed into std::accumulator, or something.

There are some exciting possibilities that I want to explore here.
index.png

Advertisements

About mcturra2000

Computer programmer living in Scotland.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s