Fortran vs Numpy vs Julia, and HDF5

My background is as follows:
* I used Fortran for my PhD. Typical programs were about 100 lines long (but they were difficult lines!), and haven’t touched it much since then
* Extensive Python experience, and I’ve recently tried to convert to “python 3 only”. Recent start with numpy, and all that matplotlib goodness * Just started learning Julia

I’ve recently been much more interested in financial data analytics – hence my exploration of all things numerical.

I have been using HDF5 as a container for my data. It, too, is something I’m new to. I was beguiled by the sales pitch that it is a good way to store data as compared to, say CSV (which is how my data is captured). To be honest, my CSV files are small, so using the HDF5 file format to store the data is probably akin to using a sledgehammer to crack a nut. So there’s a fair bit of learning involved in working out how to use HDF5. I’m using it as a “fun” learning exercise, but if I were to analyse cost versus benefits dispationately, I would likely conclude that HDF5 wasn’t worth the effort; for my case, at least. I’ve found that HDF5 does have a decent benefit, though: it is a standardised format. The CSV files I obtain have all sorts of irregularities to them. In fact, it seems that the sources I have obtained them from have tried every permutation they can think of for subverting the CSV file format. At least when the data has been massaged into HDF5, there’s no room for confusion as to how the data should be accessed.

I found the Python scientific libraries to have a big learning curve. It is typical that many of the libraries don’t work in the way that one expects them to conceptually. This is not a slight on the Python libs. It just seems the way of the world that in programming, a certain alignment of stars is required to persuade the computer to behave in a way that you want.

Julia is turning out to be an interesting language, and I like the way that dealing with arrays is usually more concise than with Python. Benchmarks tend to show Julia is faster than Python, but I think that’s relative. The problem tends to be when you load a module for the first time. Julia has to compile it. That tends to be a slow old process. I’m not the only one that has experienced this. I saw an online YouTube video where Julia was demonstrated in an IPython notebook. If you restart the kernel, then you are inevitably hanging around for some crucial library to recompile. The demonstrator was obviously a little frustrated by this. That is understandable. In fact, I’m pretty sure that it will hurt Julia’s adoption. I am aware that one can build a Julia executable with your favourite modules compiled in, but it’s not an ideal solution.

It’s not an unfixable problem. Hopefully Julia will at some point in the future create bytecode, in the same way that Python does. Then Julia should feel much faster than Python.

HDF5 was quite tedious to use under Fortran, although I am sure I will become more comfortable with it as time progresses.

But here’s the thing. Whilst there is some more bookkeeping under Fortran, I’m not actually sure that Python and Julia bring much to the table in being able to reduce code size. After all, Fortran does provide array arithmetic. Sometimes the complexity of the algorithm means that a supposedly more expressive programming language wont reduce the line count.

Food for thought. Perhaps Fortran does suffice after all. It will certainly run quicker.


About mcturra2000

Computer programmer living in Scotland.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s