iczelia :: 13 July 2023

Some musings about register allocation.

Before you read this, remember that this is a highly hypothetical scenario completely disconnected from how CPUs work. Registers do not have to be 8 bytes, there are caches for registers, etc...

Consider a special variant of register allocation where outside of wanting to minimise the amount of spills interprocedurally, we also want to put another constraints on how the registers are allocated. For example, instead of using the first available temporary register as many code generators for MIPS do (unfortunately further away from optimal on x86 due to register requirements for div, etc...); we want to pick a register such that the last referenced register is the closest to the currently considered register. In particular, consider some function f(x, n, m) which models the runtime of the two operand instruction being currently executed. Long division p <- p/q has the computational complexity of O(q^2), hence our function f(x,n,m)=(xm)^2, where x signifies the cost of loads from p and q. Loading Q after having loaded Q again is cheap (caching), but loading P after having loaded Q or vice versa is more expensive. The cost x is defined as |R_p - R_q| - i.e. the distance between two registers in the register file. This may come useful in scenarios where registers are large and the computer has multi-level cache that quickly evicts unused data and eagerly caches new data.

For example: div r1, r4 has the cost factor |1-4|=3 further applied to the worst-case (r4)^2 amount of operations - the instruction would take 3*(r4)^2 cycles. The cost factor of div r2, r1 would be only |2-1|=1, hence the instruction takes only (r1)^2 cycles.

Hence the question posed is: What is the most efficient algorithm to model this particular register preference? The answer to this question would probably provide answers to other similar questions regarding register preference that are ubiquitous on platforms where the exact register you choose for a particular variable does matter (e.g. x86; due to how certain instructions like to store output/take input from a hardcoded register).

A crucial thing to notice is that the problem of register allocation with a preference on the closest register available is essentially equivalent to a modified variation of the 1-D Travelling Salesman Problem where every node can be (and is encouraged to be) visited multiple times if possible!

It just so happens that compilers appear to emit low numbered registers, but that's due to preferential treatment for volatile registers, as used by calling conventions and then coalesced etc. since using a higher numbered (typically where non-volatile/callee-save live) amounts to spill + reload insertion. In graph colouring, one could use a heuristic to select free colours as closest to an already selected colour. Hence the compiler backend developer's solution to the problem would be prioritising colours closest to the direct neighbours already assigned colours assuming an ordering to the colours, obviously where colours are numbers to produce a non-optimal but relatively good result.

Notice how similar this approach is to the nearest neighbour search approximate solution to the Travelling Salesman Problem. Hence, to connect the dots: I think that this particular solution is the best one considering speed and how close the output is to being optimal. An optimal analogue would be the exhaustive search TSP solution, while a considerably less optimal but way faster in terms of computational complexity option could be applying the Christofides algorithm.

If you are still wondering what is the use of it, I have to disappoint you and refer you to reading and comprehending this article: https://esolangs.org/wiki/Brainfuck