Lessons learned from #plaintextaccounting in UNIX

I have been using UNIX to keep track of my finances for a long time. I have written quite a few packages over the years in C, C++, Haskell, Python, Tcl and Lisp.

I have used a number of publicly-available applications, including clacct, beancount and the ever-popular ledger, by John Wiegley.

The problem I have found with programs like ledger (and all due to respect to John, who holds the prestigious position of maintaining Emacs!) is that they don’t do everything I want, has quite a steep learning curve, and it’s particularly difficult to compose in a pipeline. Ledger likes to be the final output, which makes it difficult to do stuff for further processing.

Ledger is written in C++ and, according to cloc, is 34k lines long. That is a big program! I think that ledger is being re-written in Haskell. I have enjoyed (and rued) programming in Haskell. Nowadays, I do nearly all of my programming in C++, though. At the end of the day, C++ gets the job done, is close to the level of Python (esp. now with C++17 ), and it’s one less dependency to install. I use Arch Linux, so I’m keen to keep stuff off my system.

In contrast, my own effort in C++ weighed in at about 3000 lines of code. It’s functionality is quite good; it even downloaded share prices, for example.

But I’ve basically abandoned all that in favour of bash/awk/sed scripts. The scripts weigh in at about 350 lines of code. I’m sure you’ll agree, that’s quite a reduction.

So the first lesson might be: roll your own system.

Create a “DSL” (doman-specific language) using tab-separated delimiters

The thing you definitely don’t want to do is create a parser. That will require code, code, and more code.  Also, anything that might potentially want to process your input has to be able to understand the input. That means that you have a parser for each program.

If, instead, you create your input data in a textual form, using tabs to separate fields, you have a trivially-parseable input source. It plays well with UNIX tools, especially awk. Your commands can be quite simple. For example, I have a “command” called “ntran”, which specifies a date, a debit account, a credit account, and amount, and a description.

I also have “etran”, which is similar, but works with shares.

If you need fancier (which I don’t use), for example being able to group a set of transactions together and have a balance be set to an account, then you can still do that. I had invented three names for grouping: trn-start (start a transaction), trn (add a transaction line) and trn-end (end a transaction). trn-end takes the running balance as the contra to a specified account.

I can do this via the magic of …

awk is awesome

awk really is a programming language all to itself. My scripts now use it ubiquitously. awk has dictionaries, pattern matching, arithmetic and logical operators. That’s the bulk of what you need. And, if you had used tabs to separate fields within records, it is very easy to process records that your script is interested in.

sed, tee, pee and pipes

awk doesn’t solve everything, of course. sed has its uses, too.

I am leaning away from using intermediate files to store data for later processing. I am leaning more towards a “pure pipe” solution. You may need tee (which can store data in a file as well as printing to stdout) and pee (which passes stdin to several processes).

What’s interesting here is that you can insert data capture in a pipe stream. I wrote a very small util called “teet”. teet takes two arguments, a capture flag, and an output file. If the capture flag is set to 0, it just passes stdin to stdout. If it is set to  something else, it copies stdin to the file as well as stdout (using tee).

Consider how neat that is. If I need to debug stuff in C++, I have to start my debugger and set my breakpoint at  a suitable point to see what state the data is in. But if I use my teet command, I can capture state directly to an output file. Or, if I don’t want to debug, I just set the flag to 0. Very convenient. And I can give the debug files meaningful names.

Don’t forget sort

My inputs are not in date order. I want it sorted in date order primarily, but then in input order secondarily. I achieve this using awk. I create a sort field as the first record, consisting of two parts: the date, and a line number. The line number is obtained by using a counter in awk.

When I do this, I can redirect the output to the sort command, and then remove the index I created using sed.


One thing that I found difficult to get to grips with in John’s ledger program is in the aggregation of accounts into “meta accounts”.

One approach is to use its nesting ability. That’s quite cumbersome, though, even with the aliasing facility.

Another approach is to using tagging. That’s kinda OK, but sometimes I just want a grand total.

My own approach using shell scripts is to create a text driver file. Here’s some of what it looks like:

gaap inc div
gaap inc int
gaap inc wag
gaap inc = ordinary-income

gaap exp amz
gaap exp car
gaap exp chr
gaap exp cmp
gaap exp hol
gaap exp isp
gaap exp msc
gaap exp = ordinary-expenses

gaap ordpr inc
gaap ordpr exp
gaap ordpr = ordinary-profit

What happens is that it prints out the balances for any accounts it sees (third column), whilst accumulating the balance for that account in the dictionary var it sees in the second column. When it sees an “=” sign in the third column, it prints the balance. Simples.

If wanted to keep a daily track of net assets, I could grep for that particular field.

In conclusion

There are so many things you can achieve, often quite simply, by adhering to the UNIX philosophy.



About mcturra2000

Computer programmer living in Scotland.
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Lessons learned from #plaintextaccounting in UNIX

  1. mike says:

    wow, sounds like a great idea to use shell scripts. Any chance I could get a copy of your scripts, would save me some time developing my own? Thanks, Mike.

    • mcturra2000 says:

      Thanks for your interest. The scripts are somewhat specialised to my purpose, so what works for me may not necessarily work for you. Some of my scripts may contain personal data, so I’ll need to screen them.

      But I’ll look into it for you, and report back in due course. What is surprising is that you really don’t need many lines of code to produce something effective. I have around 300 lines of code. Ledger has approx. 100X that amount, but doesn’t do quite what I want.

      GnuCash must be a real behemoth. It also pulls in all sort of dependencies like WebKit.

  2. Pingback: gists for #plaintextaccounting | Mark Carter's blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s