I have written a number of accounting packages in the languages C, Go, Haskell Python, Tcl (non-working), and C++. My C, Go, Python and Tcl efforts have been lost in the mists of time.
I write my input data in a bash-like format; that is to say, as a “command” with arguments separated by whitespace, using quotes if necessary, and the shebang for comments. The command is actually the description of a data item.
In my experience, this is a good engineering decision, as input is easy to parse.
Two things happened recently:
- I wanted to combine several years worth of data together. This is a tricky process, because I close off the accounts at the end of every year. So there are opening balances that are redundant if I aggregate years together. There are also transactions duplicated from the previous year. These, too, would need to be eliminated from the aggregation process
- I am interested in exploring John Wiegley’s ledger program. The data in my format is incompatible with the program. I had experimented with ledger in the past, but I generally abandoned my efforts for a number of reasons. Repos tended to supply only an old version, closing off did not work as I needed, the output was difficult to process downstream, and compilation required boost. In the past, I was extremely resistant to installing boost, as I am keen to avoid unnecessary bloat and dependencies. I have relented recently, though, as boost provides many tempting features.
Consequently, I am faced with the dilemma of how to aggregate all the data together, and how to transform the data.
My solution comes in two forms:
- shlex, a program I wrote in C++ that takes input in bash-like format, and prints each field as a line. It also builds a library that can be linked with other programs. I have added a feature that reprints the input in an m4-like syntax. The program was inspired by the Python module of the same name.
- m4, a general-purpose macro processor. m4 takes input text and transforms it according to a set of macros. The macros can defined anywhere. m4 is language-agnostic, it is just a text-transformer. m4 is a very old language, and should be ubiqitous on all UNIX and UNIX-like systems. m4 powers GNU autotools.
So, I convert my raw input to m4-compatible macros via shlex. All I need to do now is write m4 macros to transform the input into a suitable form. This should be relatively straightforward.
m4 macros can be a bit kludgey. If more power is needed, then there is a non-standard package called pyexpander. This is a macro processing language in the style of m4, but you can embed python code. This should, theoretically, make it a more powerful, less kludgey, processor. I have never tried it, though.
It is also worthwhile considering using Tcl, which seems tailor-made for transformation work. Tcl seems quite underrated.
If I had not chosen to write my inputs in a format that was easy to parse, but had used some kind of custom format, than I would be facing difficulties.
In fact, I am inclined to say that program output should be in a bash-like syntax, too. This should make it much easier for down-stream processes to parse.