## Faster division on processors that don’t support division

I’ve recently become more interested in the ARM architecture, and programming it in assembler. It’s a bit of a slow process for me, and I’m eagerly awaiting a book from Amazon. It’s due tomorrow, so hopefully it will be a nice xmas present for me.

Imagine my surprise when I found that ARM processors don’t support division, as it is too complicated to implement in hardware for the perceived benefit!

So hmm. The obvious way to solve the problem is by repeated accumulation of the divisor. Not exactly fast!

So I thought of a way of speeding it up. The trick is to start with an approximate value of 2. Given an numerator n and denominator d, first find p and q such that

2^(p-1) < n <= 2^p

2^(q-1) < d < 2^ q

Assume n>d>0. You can tweak things around a little to cope with special cases where this is not true.

It follows that n/d >= 2^(p-q). This is great, because it sets a lower bound for the result, meaning that you can short-circuit a lot of the work. What’s really good about it is that multiplying an dividing by 2 is fast, and possible, on chips that don’t have division on them, like the ARM. They do have left and right register shifts, however, which is why I chose powers of 2.

By way of illustration, suppose we want to divide 200 by 3.

So n = 200 = 2^7 + 72, and d = 2^1+1. To get the values 7 and 1, you can keep doing right-shifts on the numbers until you’re left the value 1, or 0 if you prefer.

Aha, now you know that n/d >= 2^(7-1) = 2^6 = 64. So you start with the value 3*64 as your first guess as to what n/d is, and then do further additions of 3 until is equals or exceeds n. You’ll then see that floor(200/3)= 66.

It doesn’t eliminate all steps, of course, but it should eliminate a lot of donkeywork.

An improvement: Actually, I just thought, a better way to it is to find the largest q for which (2^q) * d >= n

This has the advantage that you only need to do a left-shift on the denominator and a compare at each step. If you go too far, then you just do a final right-shift. That’s your starting point. ARM might be able to do shifts conditionally, too, which may result in a speedup. 