It’s not the first time that I’ve experimented with embedding a language in my neoleo spreadsheet. I usually end up ripping it out as being a not-so-good idea.
The code is based on the GNU oleo spreadsheet written in C. I find it quite hard to follow, and I am looking for ways to reduce the code base.
One motivation for using guile is how it is relatively easy to embed. It also has a PEG parser. So I could use guile to parse cell formula, store them in a cell reference table, and call them as required.
Neoleo already does this. The scanner itself 1187 lines long. But that’s not the end of it, because the parsed formulae must be byte-compiled (959 lines) and decompiled (884 lines). That’s to say nothing about the complexity of the code.
I have started to explore guile’s peg-parser, and this is what I have so far:
(define-peg-string-patterns "formula <-- func / num func <-- funct '(' num ')' funct <-- [a-z]+ num <-- [0-9]+ ")
It is by no means complete (or even correct), but it does demonstrate in my mind that considerable compression of the code is possible.
The plan is, therefore, to parse the cell formula into a scheme procedure, compile it, and then store that formula in a hash table. So suppose a cell’s formula translates to the scheme code ‘(lambda (x) (+ x 13)) (actually, that’s not quite right, because it shouldn’t depend on any parameters, but let’s not quibble about that right now).
Then I set the cell formula using an expression like:
(hash-set! my-cell-formulae some-cell-ref (compile '(lambda (x) (+ x 13))))
The good news is that I can now jettison my byte-compiler. It simply isn’t necessary. Guile is doing all the heavy lifting for me.
I can see that other big chunks of code can be removed in favour of guile. For example, the file lists.cc keeps track of the cells in the spreadsheet. It is 1067 lines long. Well, I think a lot of that could be replaced by a hash-table, and a few auxillary functions.
In general, the current code’s complexity score is through the roof, with functions that are too large, and too heavily nested. The command input looping is a nightmare, and that’s after I did some refactoring. One function in cmd.cc has a complexity score of 316 (and that’s after I did some refactoring). The manpage describes complexity scores of 30 as “Difficult to maintain code”. Anything above 200 is described as “I only wish I were kidding”.
So, some simplification possible, methinks.
Wish me luck.