I have been using UNIX to keep track of my finances for a long time. I have written quite a few packages over the years in C, C++, Haskell, Python, Tcl and Lisp.
I have used a number of publicly-available applications, including clacct, beancount and the ever-popular ledger, by John Wiegley.
The problem I have found with programs like ledger (and all due to respect to John, who holds the prestigious position of maintaining Emacs!) is that they don’t do everything I want, has quite a steep learning curve, and it’s particularly difficult to compose in a pipeline. Ledger likes to be the final output, which makes it difficult to do stuff for further processing.
Ledger is written in C++ and, according to cloc, is 34k lines long. That is a big program! I think that ledger is being re-written in Haskell. I have enjoyed (and rued) programming in Haskell. Nowadays, I do nearly all of my programming in C++, though. At the end of the day, C++ gets the job done, is close to the level of Python (esp. now with C++17 ), and it’s one less dependency to install. I use Arch Linux, so I’m keen to keep stuff off my system.
In contrast, my own effort in C++ weighed in at about 3000 lines of code. It’s functionality is quite good; it even downloaded share prices, for example.
But I’ve basically abandoned all that in favour of bash/awk/sed scripts. The scripts weigh in at about 350 lines of code. I’m sure you’ll agree, that’s quite a reduction.
So the first lesson might be: roll your own system.
Create a “DSL” (doman-specific language) using tab-separated delimiters
The thing you definitely don’t want to do is create a parser. That will require code, code, and more code. Also, anything that might potentially want to process your input has to be able to understand the input. That means that you have a parser for each program.
If, instead, you create your input data in a textual form, using tabs to separate fields, you have a trivially-parseable input source. It plays well with UNIX tools, especially awk. Your commands can be quite simple. For example, I have a “command” called “ntran”, which specifies a date, a debit account, a credit account, and amount, and a description.
I also have “etran”, which is similar, but works with shares.
If you need fancier (which I don’t use), for example being able to group a set of transactions together and have a balance be set to an account, then you can still do that. I had invented three names for grouping: trn-start (start a transaction), trn (add a transaction line) and trn-end (end a transaction). trn-end takes the running balance as the contra to a specified account.
I can do this via the magic of …
awk is awesome
awk really is a programming language all to itself. My scripts now use it ubiquitously. awk has dictionaries, pattern matching, arithmetic and logical operators. That’s the bulk of what you need. And, if you had used tabs to separate fields within records, it is very easy to process records that your script is interested in.
sed, tee, pee and pipes
awk doesn’t solve everything, of course. sed has its uses, too.
I am leaning away from using intermediate files to store data for later processing. I am leaning more towards a “pure pipe” solution. You may need tee (which can store data in a file as well as printing to stdout) and pee (which passes stdin to several processes).
What’s interesting here is that you can insert data capture in a pipe stream. I wrote a very small util called “teet”. teet takes two arguments, a capture flag, and an output file. If the capture flag is set to 0, it just passes stdin to stdout. If it is set to something else, it copies stdin to the file as well as stdout (using tee).
Consider how neat that is. If I need to debug stuff in C++, I have to start my debugger and set my breakpoint at a suitable point to see what state the data is in. But if I use my teet command, I can capture state directly to an output file. Or, if I don’t want to debug, I just set the flag to 0. Very convenient. And I can give the debug files meaningful names.
Don’t forget sort
My inputs are not in date order. I want it sorted in date order primarily, but then in input order secondarily. I achieve this using awk. I create a sort field as the first record, consisting of two parts: the date, and a line number. The line number is obtained by using a counter in awk.
When I do this, I can redirect the output to the sort command, and then remove the index I created using sed.
One thing that I found difficult to get to grips with in John’s ledger program is in the aggregation of accounts into “meta accounts”.
One approach is to use its nesting ability. That’s quite cumbersome, though, even with the aliasing facility.
Another approach is to using tagging. That’s kinda OK, but sometimes I just want a grand total.
My own approach using shell scripts is to create a text driver file. Here’s some of what it looks like:
gaap inc div gaap inc int gaap inc wag gaap inc = ordinary-income gaap exp amz gaap exp car gaap exp chr gaap exp cmp gaap exp hol gaap exp isp gaap exp msc gaap exp = ordinary-expenses gaap ordpr inc gaap ordpr exp gaap ordpr = ordinary-profit ...
What happens is that it prints out the balances for any accounts it sees (third column), whilst accumulating the balance for that account in the dictionary var it sees in the second column. When it sees an “=” sign in the third column, it prints the balance. Simples.
If wanted to keep a daily track of net assets, I could grep for that particular field.
There are so many things you can achieve, often quite simply, by adhering to the UNIX philosophy.